DETERMINISTIC  MULTIPROCESSOR  SCHEDULING 
FOR  MIMD  COMPUTER  SYSTEMS 


By 

FRANK  D.  ANGER 


A  DISSERTATION  PRESENTED  TO  THE  GRADUATE  SCHOOL 
OF  THE  UNIVERSITY  OF  FLORIDA  IN 
PARTL\L  FULFILLMENT  OF  THE  REQUIREMENTS 
FOR  THE  DEGREE  OF  DOCTOR  OF  PHILOSOPHY 


UNIVERSITY  OF  FLORIDA 


1987 


ACKNOWLEDGMENTS 


The  opportunity  to  give  thanks  to  people  and  institutions  which  have  taken  part 
in  the  creation  of  this  dissertation  is  one  which  I  heartily  embrace;  reaching  this 
point,  however,  has  cut  so  deeply  into  the  fabric  of  my  life  and  the  lives  of  my  fam- 
ily, that  weighing  the  effects  of  the  laughter  of  my  children  and  the  advice  of  my  pro- 
fessors becomes  a  difficult  enterprise.  Nonetheless,  I  would  like  first  to  specifically 
thank  those  who  consciously  acted  as  counselor  or  gave  assistance  in  this  work.  My 
dissertation  advisor.  Dr.  Yuan-Chieh  Chow,  has  provided  the  environment,  guidance, 
and  many  ideas  which  made  the  research  possible.  Each  of  the  other  dissertation 
committee  members  also  contributed  to  the  work:  Dr.  Louis  Martin- Vega  has  been  a 
frequent  source  of  information,  perspective,  and  enthusiasm;  Dr.  Douglas  Dankel  has 
provided  constant  support,  detailed  editorial  criticism,  and  perennial  good  judgement; 
Dr.  Stephen  Thebaut  has  often  brought  soxmd  advise  and,  just  as  often,  good  humor; 
and  Dr.  Gerhard  Ritter  has  cooperated  throughout  the  research. 

Beyond  the  committee,  I  am  grateful  to  have  had  the  opportunity  to  work  with 
three  undergraduates  whose  senior  projects  have  made  direct  and  substantial  contri- 
bution to  the  research  presented  here:  Dennis  Suppe,  Borden  Wilson,  and  Michael 
Ellis  did  fine  jobs  of  developing  different  aspects  of  the  scheduling  simulator  and  car- 
rying out  arduous  portions  of  the  data  collection  and  analysis.  Moreover,  Dennis's 
energy,  optimism,  and  conversation  kept  all  four  of  us  going. 

There  is,  however,  one  person  to  whom  my  debt  is  indeed  great.  Dr.  Jing-Jang 
Hwang  has  provided  ideas,  critique,  vision,  and  hard  work.  He  has  also  been  the 
other  half  of  many  long  discussions  which  have  given  substance  to  the  research  and 

u 


larger  diflBculties;  and,  finally,  he  is  responsible  for  putting  this  work  into  that  mys- 
terious form  that  makes  it  print  out  so  beautifully. 

On  the  other  hand,  there  are  many  people  who  have  contributed  to  this  research 
perhaps  unknowingly.  Dr.  Carlos  Segami  set  the  example  and  gave  the  first  impetus 
toward  changing  roles  from  professor  to  student,  while  the  University  of  Puerto  Rico 
gave  its  moral  and  monetary  support  to  my  adventure.  From  that  institution.  Pro- 
fessor Brunilda  Nazario  and  Oliva  Loperena,  in  particular,  assisted  greatly  to  make 
our  time  in  Gainesville  more  trouble  free.  Dr.  Roger  Elliott,  then  chairman  of  the 
CIS  Department  at  the  University  of  Florida,  likewise  gave  encouragement  of  many 
kinds. 

My  mother,  Julia  Anger,  has  patiently  watched  this  venture  and,  as  always, 
given  it  her  blessing.  Whether  or  not  they  think  they  helped  at  all,  I  must  thank  my 
three  sons— Angel,  Gus,  and  Art-for  doing  what  young  people  do  so  well,  that  fills  us 
as  parents  with  awe;  and  I  also  thank  them  for  understanding  when  the  computer 
and  books  miist  have  seemed  more  important  to  me  than  they.  Finally  there  is  one 
person  whose  contribution  and  support  were  constant.  My  wife,  Rita,  gave  me  the 
greatest  encouragement.  Her  incredible  dynamism  and  her— as  I  would  often  tell 
her— unfounded  confidence  kept  me  going  when  I  would  have  gladly  wavered.  As  my 
companion  in  study,  my  co-worker,  my  most  tenacious  critic,  and  my  source  of 
inspiration  she  has  helped  in  more  ways  than  can  be  described  in  character  strings. 
To  her  belongs  my  eternal  gratitude. 


iii 


TABLE  OF  CONTENTS 


Page 

ACKNOWLEDGMENTS   ii 

ABSTRACT   vi 

CHAPTER 

I         BACKGROUND   1 

Scheduling   2 

Computer  Architecture   6 

Concurrent  Programming   8 

Performance  Analysis   9 

n         TURNAROUND  TIME  IN  A  GENERAL  PURPOSE  MPS   13 

Dynamic  and  Static  Scheduling  Problems   14 

Analytic  Approach  to  Heuristic  Algorithms   18 

Multiprocessor  Simulator   28 

Simulation  Results   30 

Towards  a  Theory  of  Program  Size   38 

Conclusions   43 

m         LOOSELY  COUPLED  SYSTEMS   45 

Scheduling  and  Communication   46 

An  Algorithm  for  Precedence  Trees   49 

Extensions   57 

IV         OTHER  MULTIPROCESSOR  SCHEDULING  PROBLEMS   68 

More  MIMD  Scheduling  Problems   68 

SIMD  and  Specialized  Architecture  Problems   73 

Open  Questions   74 

BIBLIOGRAPHY   76 

GLOSSARY   81 

iv 


APPENDICES   83 

A         RESULTS  OF  FRIEDMAN  ANALYSIS   83 

B         STATISTICAL  TEST  RESULTS   90 

BIOGRAPfflCAL  SKETCH   91 


V 


Abstract  of  Dissertation  Presented  to  the  Graduate  School 
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By 
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Chairman:  Dr.  Yuan-Chieh  Chow 

Major  Department:  Computer  and  Information  Sciences 

The  research  reported  contributes  to  the  theory  of  scheduling  as  it  applies  to 
modern  general-purpose  multiple  processor  systems.  Two  distinct  deterministic 
models  are  considered. 

With  the  first  model  for  tightly  coupled  systems,  a  study  is  made  of  the 
eSiciency  of  scheduling  policies  for  minimizing  the  average  turnaround  time  of  a  set 
of  independent  jobs,  each  consisting  of  a  collection  of  schedulable  subtasks  obeying  a 
precedence  relation.  A  mmiber  of  policies  are  defined  based  on  the  well-known  Shor- 
test Job  First  (SJF)  algorithm,  and  a  simulation  study  is  made  comparing  their  per- 
formance. Analysis  of  the  results  reveals  no  clear  winner.  On  the  other  hand,  best 
case  and  worst  case  bounds  are  obtained  for  one  of  the  algorithms,  called  CSJF, 
which  indicate,  in  particular,  that  it  can  do  no  worse  than  its  sequential  counterpart, 
SJF.  Moreover,  CSJF  is  shown  to  be  asymptotically  optimal  if  the  length  of  the  indi- 
vidual jobs  is  bounded.  Finally,  a  new  measure  of  the  size  of  a  job  is  proposed  as  the 
basis  of  a  new  heuristic  algorithm  for  this  problem.  It  is  further  shown  that  the  size 


vi 


so  defined  is  more  closely  related  to  the  optimal  makespan  (completion  time)  of  the 
job  when  run  on  m  processors  than  either  the  critical  path  time  or  the  total  process- 
ing time. 

With  the  second  model,  for  loosely  coupled  systems,  algorithms  are  developed  to 
minimize  the  makespan  of  a  set  of  precedence  related  tasks  when  run  on  an  m- 
processor  system  in  which  commimication  delays  are  not  negligible.  A  nimaber  of 
basic  assmnptions  are  made  on  the  interconnection  architectm^e  and  the  communica- 
tion protocol  in  order  to  treat  the  system  deterministically.  Additionally,  the  princi- 
pal results  apply  under  the  assumptions  Fully  Connected  and  Identical  Links.  An 
efficient  algorithm,  Join  Latest  Predecessor  (JLP),  is  developed  and  shown  to  be 
optimal  in  case  the  precedence  relation  on  the  tasks  form  an  "opposing  forest"  and 
the  system  satisfies  two  additional  assmnptions:  Sufficient  Processors  and  Short 
Communication  Delays.  Two  polynomial-time  extensions  of  the  JLP  algorithm— 
EJLP  and  JLP/D— are  presented:  the  first  is  conjectured  to  be  optimal  when  there 
are  not  sufficient  processors,  while  the  latter  is  proved  to  be  optimal  for  arbitrary 
precedence  relations  but  with  sufficient  processors. 
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CHAPTER  I 
BACKGROUND 


From  smoky  factories  and  crystal-walled  executive  suites,  from  humming  com- 
puter centers  and  cluttered  principals'  oflBces,  from  a  whole  spectrimi  of  sources  come 
the  day-to-day  problems  which  engender  the  begrudgingly  given  planning  steps 
known  as  "scheduling."  In  most  of  these  environments,  scheduling  begins  as  orders  to 
"get  it  done  by  4  o'clock"  or  as  the  first-come-first-served  reflex  to  a  demanding  clien- 
tele. And  this  is  where  it  may  end.  But  sometimes,  long  experience  teaches  that 
more  serious  planning  may  lead  to  greater  productivity,  more  free  time,  smoother 
operations,  or  greater  profits.  Scheduling  may  have  humble  beginnings  in  the  tedi- 
ously repeated  tasks  at  a  myriad  of  similar  workstations,  or  more  glamorous  ones  in 
the  inner  sanctums  of  larger  organizations,  where  long-term  projects  are  born  and 
nurtured.  Here  scheduling  takes  on  a  more  respected  air,  and  its  necessity  and 
benefits  are  more  clearly  recognized.  The  larger  and  longer  a  project,  the  more 
essential  it  becomes  that  all  the  pieces  fit  together  correctly,  and  the  harder  it 
becomes  for  a  single  person  to  visualize  and  coordinate  all  its  components.  Thus,  a 
theory  of  scheduling  is  born  which  attempts  to  study  this  diverse  problem  area  and 
produce  rules  for  completing  effective  planning. 

The  body  of  literature  which  has  grown  up  self-consciously  referring  to  schedul- 
ing as  a  discipline  covers  half  a  century  and  an  ever-broadening  range  of  problems. 
It  has  produced  theoretical  and  practical  solutions  to  some  of  these  problems,  and  it 
has  shown  that  some  of  them  are  beyond  the  abilities  of  the  most  modern  computers 
to  solve  in  an  optimal  fashion.  Small  twists  in  the  constraints  and  conditions  under 
which  a  problem  is  posed  can  turn  an  easy  exercise  into  an  amazingly  difficult 
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computation.  Many  mathematical  and  computational  techniques  have  been  brought 
to  bear  on  these  problems:  exhaustive  search,  queuing  theory,  linear  programming, 
combinatorial  methods,  statistics,  simulations,  and  more. 

The  principal  objective  of  this  dissertation  is  to  contribute  to  the  theory  sup- 
porting the  efficient  use  of  multiple  processor  systems  (MPS).  Although  there  are 
many  ways  to  increase  efficiency,  this  work  considers  only  a  few  specific  methods 
related  to  the  scheduling  of  the  programs  to  be  executed  by  an  MPS.  Even  within 
this  apparently  narrow  area  there  lie  a  large  number  of  different  problems  and 
methods  of  solution.  In  order  to  describe  the  research  and  put  it  into  perspective 
within  the  scope  of  more  efficient  use  of  the  MPS,  it  is  necessary  to  discuss  both  the 
previous  results  in  scheduling  theory  and  the  characteristics  of  MPSs.  The  first 
chapter  addresses  itself  to  this  effort.  Chapter  11  investigates  a  class  of  scheduling 
problems  which  are  particularly  appropriate  for  a  shared-memory  multiprocessor  sys- 
tem running  concurrent  programs.  Both  analytic  and  simulation  methods  are  used  in 
this  chapter.  Chapter  III  follows  with  some  interesting  results  for  scheduling  on 
loosely  coupled  systems  with  significant  overhead  due  to  interprocessor  communica- 
tions. The  last  chapter  indicates  a  variety  of  other  related  problems  and  possibilities 
for  future  research.  Because  of  the  large  number  of  specialize  mnemonics  in  this 
work,  a  glossary  has  been  included  for  convenient  reference.  Also  included  are  sum- 
maries of  statistical  tests  associated  with  the  simulation  reported  in  Chapter  11. 

1.1.  SchpHnling 

What  constitutes  a  scheduling  problem  is  not  always  well  defined.  It  can  range 
from  simple  sequencing  of  events  to  a  complex  decision  process  affecting  the  alloca- 
tion of  a  variety  of  resources  and  the  timing  of  a  number  of  different  types  of  opera- 
tions. The  sole  interest  of  this  work  is  in  problems  of  when  and  where  to  execute 
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program  "tasks"  within  a  MPS  in  order  to  optimize  some  performance  measure. 
Some  of  the  problems  considered  also  take  into  account  the  commimication  overhead 
incurred  by  sending  messages  from  one  task  to  another;  however,  the  actual  schedul- 
ing of  messages  is  not  considered. 

One  of  the  most  application-dependent  and  subjective  criteria  in  scheduling  is 
the  objective  function:  What  is  to  be  optimized?  The  motivation  may  be 
profitability,  efficiency,  user  satisfaction,  or  some  other  criteria.  Some  possible  exam- 
ples appear  in  Table  1.1. 


Table  1.1  Some  Performance  Objectives 

APPLICATION  OBJECTIVE 

Robot  control  Minimize  makespan 

Data  processing  Maximize  throughput 

Scientific  Maximize  throughput 

Real  time  system  Mmimize  number  of  late  jobs 

On-line  database  Minimize  response  time 

Interactive  multiprog.  Min.  average  turnaroimd  time 

The  particular  assumptions  that  will  be  made  on  all  the  scheduling  problems  are 
as  follows: 

1.  The  system  has  m  identical  processors  where  m  is  greater  than  one. 

2.  At  any  given  moment,  the  system  has  a  fixed  collection  of  tasks,  T,-,  to  execute, 
each  with  a  fixed,  known  processing  time,  «,-.  (This  is  a  "deterministic"  schedul- 
ing problem.) 

3.  Once  a  task  is  assigned  to  a  processor,  the  task  must  be  run  to  completion  on 
that  processor  without  interruption.  (This  is  a  "non-preemptive"  problem.) 

4.  Any  given  processor  can  only  run  a  single  task  at  a  time. 

5.  If  there  is  a  "precedence  relation"  among  the  tasks,  then  no  task  can  be 
scheduled  to  run  before  all  its  predecessors  have  run  to  completion. 
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6.  In  some  problems,  the  whole  collection  of  tasks  Is  known  at  the  outset  ("static" 
problem),  while  in  others,  tasks  arrive  at  different  times  and  nothing  is  known  of 
them  mitil  they  arrive  (This  is  a  "dynamic"  problem). 

7.  The  performance  measure  to  be  optimized  will  always  be  either 

(a)  Total  time  to  complete  the  set  of  tasks  (makespan),  or 

(b)  Average  turnaround  time  running  a  set  of  independent  jobs. 

In  order  to  give  some  perspective  to  the  state  of  scheduling  theory  today,  it  is 
necessary  to  discuss  these  assumptions.  Some  of  them  are  standard  to  most  schedul- 
ing problems,  but  others  restrict  the  research  considerably.  The  first  rules  out  the 
large  class  of  single-processor  scheduling  problems,  while  the  second  eliminates  the 
whole  area  of  non-deterministic  scheduling  problems,  which  are  frequently  analyzed 
through  queuing  theory  and  other  statistical  methods.  Additionally,  Assumption  3 
limits  the  investigation  to  the  non-preemptive  problems,  most  of  which  have  a 
corresponding  preemptive  problem.  Although  the  theory  is  equally  well  developed 
for  the  areas  thereby  left  out,  nothing  further  will  be  said  about  them.  On  the  other 
hand.  Assumption  7  gives  a  very  specific  focus  to  the  rest  of  this  dissertation,  making 
it  appropriate  to  talk  briefly  in  this  chapter  about  other  possible  objective  functions. 

In  many  situations,  particularly  real-time  programming,  it  is  required  that 
answers  be  obtained  within  a  given  time  limit;  otherwise  their  value  degrades  or  they 
become  worthless.  In  such  cases,  minimizing  the  number  of  late  jobs  or  minimizing 
the  maximum  turnaround  time  is  more  appropriate  objectives  than  the  ones  selected 
for  investigation.  On  the  other  hand,  a  computer  center  director  handling  batch 
data-processing  jobs  might  be  most  interested  in  the  total  volume  of  work  he  can 
finish  in  a  given  time,  as  measured  in  amount  of  output,  number  of  completed  jobs, 
or  number  of  seconds  of  "useful"  CPU  time.  In  some  applications,  a  "deadline"  is 
given  for  each  task,  and  it  may  be  required  either  to  minimize  the  total  "tardiness" 
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(total  time  spent  after  the  deadlines  running  late  tasks)  or  the  total  "lateness"  (the 
sum  of  the  differences  [finish  time  -  deadline],  some  of  which  may  be  negative). 
Further  discussion  of  possible  objectives  is  found  in  Section  4  of  this  chapter. 

There  are  other  kinds  of  scheduling  problems  which  differ  from  the  ones  dis- 
cussed so  far  because  they  assimie  different  kinds  of  processors  and  different  kinds  of 
tasks.  In  these  problems,  some  tasks  must  be  performed  only  on  certain  processors 
or  must  be  performed  on  some  combination  of  processors  in  some  given  order.  Such 
"job-shop"  scheduling  problems  relate  to  many  industrial  and  assembly-line  environ- 
ments, but  can  also  be  used  to  model  the  situation  in  a  computing  system  when 
schedviling  of  the  input-output  processors  is  included  in  the  problem.  Chapter  IV 
presents  more  types  of  scheduling  problems. 

In  1981,  Lageweg,  Lawler,  Lenstra,  and  Rinnooy  Kan  [LAGESla,  LAGESlb] 
published  a  computerized  classification  of  results  for  a  very  wide  variety  of  deter- 
ministic scheduling  problems.  These  problems  were  presented  using  a  formal  scheme 
for  describing  and  classifying  the  different  kinds  of  scheduling  problems  based  on 
three  major  parameters: 

(a)  the  number  and  kinds  of  processors; 

(b)  the  job  characteristics:  preemptive  or  not,  the  type  of  precedence  relation;  and 
other  restrictions  on  start  times  and  finish  times,  and 

(c)  the  objective  function  to  be  optimized. 

They  used  a  notation  that  is  similar  to  the  one  used  in  queuing  theory  to  describe  the 
type  of  problem  under  discussion:  P  jtrtt  /jjCj,  for  example,  represents  the  problem 
of  an  arbitrary  number  of  identical  processors  (F)  and  a  set  of  tasks  satisfying  a 
tree-shaped  precedence  relation  with  the  objective  of  minimizing  the  sum  of  the  com- 
pletion times  (hence  minimizing  the  average  turnaround  time).  The  authors  of  the 
scheme  further  observe  a  partial  ordering  on  the  difiiculty  of  the  problems  being 
classified:  for  example,  both  minimizing  maximum  lateness  and  minimizing  average 
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turnaround  time  can  be  accomplished  by  an  algorithm  capable  of  minimizing  the 
total  (or  average)  tardiness.  In  this  way,  they  are  able  to  present  information  on  the 
"maximal"  easy  problems-ones  with  known  polynomial-time  solutions  such  that  no 
more  "diflBcult"  problem  has  been  solved— and  "minimal"  hard  problems— ones  which 
are  MP- hard  and  such  that  no  "easier"  problem  is  known  to  be  NP-hard.  In  1981,  this 
scheme  applied  to  the  literature  on  scheduling  was  able  to  classify  4536  scheduling 
problems  into  416  easy,  432  open,  and  3688  hard  problems  [LAGESlb]. 

Although  most  of  the  "mainstream"  scheduling  problems  can  be  classified  under 
the  foregoing  scheme,  there  is  still  much  literature  on  deterministic  scheduling  which 
does  not  fall  into  these  divisions.  Work  on  scheduling  with  a  number  of  additional 
scarce  "resources,"  scheduling  with  set-up  and  tear-down  times,  and  scheduling  with 
variable  numbers  of  processors  are  examples  of  further  problems  which  have  received 
attention  in  the  literature.  Chapter  HI  of  this  dissertation  gives  attention  to  yet 
another  kind  of  scheduling  problem,  in  which  there  are  significant  time  delays  associ- 
ated with  scheduling  a  task  and  its  immediate  successor  on  separate  processors.  Such 
problems  provide  a  more  apt  model  of  loosely-coupled  computer  systems  than  do  the 
traditional  models. 

1.2.  Computer  Architectnrp 

Classified  scheduling  theory  considers  communication  time  to  be  negligible, 
implying  that  the  actual  computer  architecture  does  not  enter  into  the  problem. 
When  there  are  substantial  delays  due  to  the  communication  between  processors, 
however,  the  method  of  interconnecting  the  processors  becomes  relevant  to  the  prob- 
lem formulation.  For  the  purpose  of  classifying  scheduling  problems,  therefore,  it  is 
important  to  distinguish  between  two  types  of  systems:  tightly  coupled  and  loosely 
coupled.    Tightly  coupled  systems  use  shared  memory  to  commimicate  between 


processors,  and  communication  times  can  be  considered  as  negligible  at  all  times.  In 
loosely  coupled  systems,  on  the  other  hand,  each  processor  has  its  own  memory  and 
commimication  must  be  done  via  some  form  of  data  bus  or  switching  network.  In 
the  most  loosely  coupled  system  of  all— the  computer  network—each  processor  "node" 
is  a  completely  independent  unit  and  communication  is  via  external  cables  or  tele- 
phone hookups.  In  the  following  discussion,  loosely-coupled  systems  are  assumed. 

Important  for  the  determination  of  optimal  schedules  are  whether  direct  com- 
munication is  possible  between  any  two  processors  and  whether  there  is  the  possibil- 
ity of  contention  among  messages  for  the  use  of  the  commimication  channels.  In  the 
ideal  case— complete  contention-free  connection  between  all  processors— the  calcula- 
tion of  commimication  delays  is  relatively  easy,  while  in  a  partially  connected  system 
with  shared  busses,  prediction  of  exact  communication  delays  may  be  impossible. 
Similarly,  if  the  average  commimication  delays  are  extremely  small  in  comparison  to 
the  average  computation  time  of  the  tasks,  the  effect  of  these  delays  will  be  minor  in 
terms  of  choosing  a  good  schedule,  whereas  if  the  commimication  delays  are  much 
greater  than  the  average  computation  times,  then  planning  to  minimize  these  delays 
may  be  more  significant  than  worrying  about  intelligent  distribution  of  the  task 
workload.  In  the  two  extreme  cases— zero  communication  delays  and  infinite  com- 
munication delays— the  scheduling  problem  reduces  to  classical  multiprocessor 
scheduling:  in  the  latter  case,  to  the  scheduling  of  independent  tasks.  The  specific 
assumptions  needed  on  the  communication  between  tasks  in  such  systems  are 
presented  in  Chapter  EH. 

Other  characteristics  of  the  architecture  also  play  a  role  in  the  determination  of 
optimal  schedules:  for  example,  the  relative  speeds  of  the  different  processors, 
whether  or  not  the  processors  are  equivalent  in  terms  of  the  jobs  they  can  perform, 
and  what  kind  of  control  of  the  processors  is  possible.  This  last  characteristic  leads 
to  a  gross  classification  of  multiple  processor  computing  systems  according  to  the 


8 

amount  of  independence  of  control.  Flynn  [FLYN66]  proposed  the  widely  accepted 
acronyms  SISD  (Single  Instruction  Single  Data),  SIMD  (Single  Instruction  Multiple 
Data),  and  MIMD  (Multiple  Instruction  Multiple  Data)  for  increasingly  generalized 
systems.  SISD  systems  are  the  equivalent  of  single  processor  systems.  SIMD  sys- 
tems, such  as  vector  processors,  apply  the  same  operations  to  different  data  streams, 
allowing  efficient  parallel  processing  of  large  numbers  of  similar  calculations.  Finally, 
in  the  MIMD  systems,  control  of  the  processors  is  independent,  allowing  each  proces- 
sor to  apply  its  own  set  of  instructions  to  its  own  data  stream.  In  this  dissertation, 
the  following  assumptions  are  imiversally  observed: 

(1)  The  computer  system  is  an  MIMD  system,  tightly  or  loosely  coupled  according 
to  the  problems  being  discussed,  and 

(2)  All  processors  are  assumed  identical:  they  operate  at  the  same  speed  and  any 
task  can  be  performed  equally  well  on  any  of  the  processors. 

Concurrent  Prngramming 

In  order  to  understand  the  significance  of  the  problem  of  improving  the  average 
turnaround  time  as  presented  in  Chapter  11,  it  is  necessary  to  understand  the  idea  of 
concurrent  programming.  The  normal  high-level  programming  languages  allow  the 
user  to  write  very  sophisticated  programs,  but  all  such  programs  share  the  property 
of  being  sequential:  they  are  to  be  executed  in  a  predetermined  order,  one  instruc- 
tion at  a  time.  In  a  single-processor  environment,  this  is  perfectly  natural,  but  in  a 
miilti-processor  environment  it  is  too  restrictive.  Concurrent  programming  makes 
use  of  hardware  and  system  software  in  such  a  way  as  to  allow  the  simultaneous  exe- 
cution of  segments  of  a  program  which  are  independent  of  one  another  logically. 
High-level  language  constructs  such  as  FORK-JOIN  and  COBEGIN-COEND  support 
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user-specified  concurrency,  while  optimizing  compilers  written  especially  for  particu- 
lar systems  may  locate  and  exploit  implicit  concurrencies  in  a  program  written 
sequentially. 

When  a  multiple  processor  system  (MPS)  is  used  to  support  a  multi- 
programming environment,  and  if  the  individual  programs  are  written  as  concurrent 
programs,  then  the  collection  of  tasks  available  for  scheduling  at  any  moment  breaks 
into  a  number  of  subsets,  each  subset  belonging  to  a  specific  program  or  "job."  K  we 
are  interested  in  average  turnaround  time  as  a  performance  measure,  from  the  user's 
point  of  view  it  is  not  the  turnaround  time  of  the  individual  tasks  but  that  of  the 
complete  jobs  which  is  of  interest.  When  a  program  is  run,  there  is  no  particular 
interest  in  how  soon  a  given  subroutine  finishes,  but  rather  in  how  soon  the  whole 
program  finishes.  This  theme  is  developed  further  in  Chapter  H. 

1.4.  Perform anf^e  Analysis 

In  designing  computers,  operating  systems,  compilers,  and  other  system  tools,  it 
is  often  necessary  to  evaluate  the  relative  performance  of  one  system  versus  another 
or  versus  some  standard.  Such  evaluation  falls  into  the  general  area  of  performance 
analysis.  A  wide  range  of  techniques,  such  as  benchmarking,  simulation,  figures  of 
merit,  and  others,  is  used  depending  on  the  particular  situation.  An  important  first 
step,  however,  is  deciding  exactly  what  aspects  of  the  system's  performance  to  meas- 
ure and  under  what  criteria.  As  observed  in  the  discussions  of  scheduling  above, 
there  are  many— often  conflicting-goals  a  scheduler  may  have:  the  same  is  true  of 
other  aspects  of  the  system.  While  high  counts  of  instructions  per  second  may  be  a 
respectable  goal,  achieving  this  goal  through  ineflficient  code  or  only  for 
computation-intensive  programs  may  not  really  indicate  high  performance.  With  the 
imderstanding,  then,  that  there  are  many  kinds  of  performance  and  many  ways  to 
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evaluate  each  kind,  the  succeeding  paragraphs  discuss  some  of  the  performance  meas- 
ures relevant  to  scheduling. 

Perhaps  the  most  basic  measure  is  that  of  system  throughput,  which  is  usually 
measured  in  jobs  completed  per  unit  time.  Throughput  is  therefore  a  measure  of 
how  much  useful  work  a  computing  system  is  performing  in  a  given  time  interval. 
As  a  basis  for  comparison  of  performance,  however,  throughput  can  be  misleading 
unless  like-sized  jobs  are  used  in  making  the  comparison.  Ideally,  in  order  to  com- 
pare the  throughput  of  two  scheduling  policies,  they  would  be  tested  on  the  same  set 
of  jobs  or  on  jobs  with  very  similar  characteristics. 

The  throughput  is  easily  calculated  for  a  dynamic  scheduling  situation:  a 
related  measure  for  static  scheduling  is  the  makespan,  or  total  time  required  to  com- 
plete a  given  set  of  jobs.  In  the  static  case,  in  fact,  the  throughput  is  essentially  the 
reciprocal  of  the  makespan.  Both  of  these  criteria  measure  the  same  "quality"  of  sys- 
tem performance;  neither,  on  the  other  hand,  relates  to  the  satisfaction  of  an  indivi- 
dual user  in  terms  of  the  time  required  for  his  job  to  be  completed.  Scheduling  poli- 
cies favoring  high  throughput  (low  makespan)  tend  to  place  long  jobs  first  or  rvm  jobs 
in  first-come-first-served  (FCFS)  order,  unduly  lengthening  the  time  required  to  com- 
plete many  shorter  jobs. 

Minimizing  the  average  tnmarmmH  t.imp  of  jobs  in  the  system  is  a  quite 
different  kind  of  performance  goal,  closely  allied  to  the  goal  of  user  satisfaction. 
Improving  the  turnaround  time  of  the  jobs  in  the  system,  unfortunately,  may  not 
improve  throughput  at  the  same  time.  The  turnaround  time  of  a  job  is  defined  as 
the  time  from  submission  of  the  job  to  the  time  of  completion.  This  is  also  fre- 
quently referred  to  as  the  flow  timf-  of  the  job.  For  static  scheduling  problems,  the 
submission  time  of  all  jobs  is  taken  as  f  =  0,  so  the  average  turnaround  time  is  just 
the  average  completion  time.  Note  also  that  mimimizing  the  average  turnaround 
time  is  the  same  as  minimizing  the  total  flow  time~the  sum  of  the  flow  times  of  all 
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the  jobs.  Surprisingly,  running  the  shortest  job  first  optimizes  turnaround  time  while, 
at  least  on  two  processors,  running  the  longest  job  first  reduces  the  makespan  and 
hence  improves  the  throughput.  In  general,  in  an  unsatm-ated  (or  under-utilized) 
dynamic  scheduling  situation,  scheduling  has  little  efiect  on  throughput  but  can 
improve  turnaround  time,  while  in  a  saturated  (or  over-utilized)  system,  these  two 
criteria  tend  to  be  opposed  to  one  another  [KRUC78,  p.  533]. 

The  response  time  of  a  system  is  often  used  as  a  performance  measure,  particu- 
larly in  real-time  installations  such  as  interactive  systems  and  control  systems.  This 
criterion  is  usually  defined  as  the  time  from  job  submission  to  the  beginning  of  the 
first  output  produced  by  the  job,  but  variations  on  this  definition  also  appear.  The 
response  time  is  meant  to  measure  how  long  a  user  or  external  input  source  must 
wait  from  the  time  of  input  to  the  time  it  receives  some  response  to  its  input.  It  is 
therefore,  like  turnaround  time,  related  to  user  satisfaction,  or,  in  the  case  of  time- 
critical  control,  to  the  usefulness  of  the  system.  Response  time  removes  some  of  the 
dependency  on  computational  speed  that  the  turnaroimd  measure  has,  and  is  related 
more  directly  to  another  measure:  waiting  time.  The  waiting  time  of  a  job  is  the 
amount  of  time  it  spends  in  the  "wait  queue,"  or,  more  precisely,  the  amount  of  time 
from  arrival  to  completion  that  the  job  is  ready  for  processing  but  not  being  pro- 
cessed. In  most  classical  scheduling  problems,  the  relationship 

turnaround  time  =  waiting  time  -|-  processing  time 
is  assmned  to  hold,  but  if  I/O  time  is  considered  as  a  third  status,  then  the  "=" 
becomes  "> ."  Moreover,  if  a  job  can  be  concurrently  scheduled  on  more  than  one 
processor,  the  turnaround  time  can  be  less  than  the  processing  time! 

Rather  than  looking  at  the  jobs  themselves,  performance  can  also  be  measured 
by  CPU  iitilizatinn.  This  is  normally  expressed  as  the  percent  or  fraction  of  the  total 
time  that  the  CPU  is  kept  busy.  For  multiprocessor  systems,  this  is  measured  indivi- 
dually for  each  processor  or  as  the  average  over  the  processors.  For  single-processor 
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systems,  this  is  not  a  useful  criterion  for  the  evaluation  of  scheduling  policies  since  it 
is  more  related  to  the  demand  placed  on  the  system  and  the  degree  of  multiprogram- 
ming maintained  than  on  the  method  of  ordering  the  job  executions.  On  the  other 
hand,  'load  balancing"  techniques  for  multiple-processor  systems  work  to  equalize  the 
utilization  of  the  various  processors  and  are  frequently  treated  as  scheduling  tech- 
niques. In  this  dissertation  load  balancing  and  CPU  utilization  are  not  considered  as 
scheduling  objectives. 

A  final  performance  criterion,  speed-up,  is  not  an  absolute  measure  of  perfor- 
mance but  rather  a  term  applied  to  the  improvement  achieved  by  a  MPS  over  a  sin- 
gle processor  system.  Speedup  is  the  result  of  a  nimiber  of  parameters,  the  most 
important  of  which  are  the  nimiber  of  processors,  the  type  of  jobs  being  nm,  the 
characteristics  of  the  interconnections  between  processors,  and  the  scheduling  policy 
used.  Usually  speedup  is  applied  to  one  of  these  parameters  at  a  time,  holding  the 
others  constant,  so  as  to  compare  the  speedup  of  two  systems,  the  speedup  of  two 
competing  algorithms,  or  the  speedup  attained  by  two  different  scheduling  policies. 
An  appropriate  definition  of  speedup  is 

g  _  sequential  processing  time 
concurrent  processing  time 

where,  if  it  is  the  scheduling  policy  that  is  being  investigated,  it  is  assumed  that  the 
same  jobs  are  nm  on  the  single-  and  multiple-  processor  systems. 

This  chapter  has  given  some  general  ideas  about  scheduling  and  the  kinds  of 
systems  the  scheduling  methods  apply  to.  The  next  two  chapters  take  up  the  specific 
scheduling  problems  which  are  the  main  focus  and  source  of  results  of  this  disserta- 
tion. The  final  chapter  broadens  the  view  again,  considering  a  wide  range  of  possible 
extensions. 


CHAPTER  n 

TURNAROUND  TIME  IN  A  GENERAL  PURPOSE  MPS 


One  measure  of  effective  vise  of  a  multiiaser  system  is  the  average  turnaround 
time  of  the  jobs  in  the  system.  If  jobs  are  indivisible  units,  the  venerable  "Shortest 
Job  First"  (SJF)  strategy  is  the  best  that  can  be  done  [CONW67];  however,  for  con- 
current programs  which  have  parts  which  can  be  run  simultaneously  on  different  pro- 
cessors, this  strategy  is  no  longer  optimal.  The  significant  point  is  that  the  objective 
of  improving  the  average  turnaround  time  of  the  jobs  in  the  system  is  not  achieved 
by  improving  the  average  turnaround  time  of  the  tasks  which  form  the  (possibly) 
concurrent  pieces  of  the  job.  This  chapter  is  devoted  to  the  study  of  the  problem  of 
minimizing  the  average  turnaroimd  time  for  collections  of  concurrent  programs  rim- 
ning  on  a  multiprocessor  sjrstem. 

As  discussed  at  the  end  of  Chapter  I,  lowering  the  average  turnaround  time  is 
an  objective  which  favors  user  satisfaction,  since  an  individual  user  of  a  multiuser 
system  is  interested  in  the  time  from  submission  to  completion  of  her  job,  not  in  how 
many  jobs  the  computer  can  complete  in  that  time  or  even  whether  it  completed  any 
other  jobs.  It  is  also  closely  related  to  the  effective  use  of  computing  resources,  since 
each  job  may  tie  up  a  number  of  peripheral  devices  or  files  while  it  is  rmming,  mak- 
ing other  jobs  wait  for  their  release. 

This  chapter  presents  research  on  the  problem  of  minimizing  the  average  tur- 
naround time  of  jobs  consisting  of  precedence-related  tasks  as  described  in  the  first 
paragraph.  The  problem  is  attacked  in  two  ways.  First,  best-case  and  worst-case 
bounds  are  provided  for  the  most  obviotis  extension  of  the  usual  SJF  algorithm  for 
static  schedding.   Second,  a  scheduling  simulator  and  its  results  are  discussed  in 
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order  to  compare  a  number  of  hexiristic  dynamic  scheduling  policies.  Section  2.1 
compares  the  dynamic  and  static  scheduling  problems;  Section  2.2  presents  the  best- 
and  worst-case  bounds  analysis,  while  the  following  two  sections  present  the  simula- 
tion method  and  results.  Section  2.5  takes  a  deeper  look  into  the  way  in  which  the 
size  of  a  job  can  be  measured.  It  is  shown  that  a  new  definition  of  job  size  may  pro- 
vide a  good  basis  for  an  improved  scheduling  heuristic.  A  summary  is  provided  in 
Section  2.6. 


2.1  ■  Dvnamic  a.nd  Static  Sphednling  Prnblpms 

The  static  version  of  a  scheduling  problem  assimies  that  a  collection  of  jobs  is 
given  and  available  at  the  outset  and  that  a  complete  schedule  could  be  created  at 
that  time.  The  dynamic  version  assumes  that  the  jobs  arrive  at  different  times  and 
that  scheduling  must  be  done  in  real  time  along  with  the  running  of  the  jobs.  In  the 
case  of  the  traditional  problem  of  minimizing  the  average  turnaround  time  of  a 
number  of  indivisible  jobs  (the  non-preemptive  case),  if  the  jobs  are  all  independent, 
then  the  static  problem  is  solved  by  the  multi-processor  version  of  Shortest  Job  First 
(SJF,  also  known  as  SPT  for  Shortest  Processing  Time)  [BAKE74a].  This  orders  the 
jobs  from  shortest  to  longest  and  schedules  the  next  available  job  in  that  order  when- 
ever a  processor  becomes  available.  If  there  is  any  type  of  precedence  relation 
among  the  jobs,  the  problem  is  NP-hard  unless  there  are  no  more  than  two  proces- 
sors and  all  jobs  are  unit  time  [LAGESlb]. 

The  dynamic  version  of  the  problem  is  solved  for  independent  jobs  in  the 
preemptive  case  on  a  single  processor  by  the  related  Shortest  Remaining  Time  First 
policy  (SRTF),  which  is  the  preemptive  version  of  SJF  [CONW67].  For  m  proces- 
sors, even  this  method  can  fail  unless  all  jobs  are  available  at  the  same  time 
[MART77].   SRTF  guarantees  that  all  processors  are  busy  when  possible  and  the 
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remaining  processing  time  of  any  waiting  job  is  longer  than  the  remaining  processing 
time  of  any  job  being  rmi.  The  non-preemptive  dynamic  case  has  no  "solution"  even 
on  a  single  processor  in  the  sense  that  no  matter  what  policy  is  adopted,  sometimes  it 
would  have  been  better  to  wait,  leaving  a  processor  idle  for  a  small  interval  of  time, 
for  the  arrival  of  another  job  and  scheduling  it  next  rather  than  any  of  the  jobs 
already  available.  It  is  obvious  that  no  scheduling  policy  can  be  smart  enough  to 
know  when  to  wait  for  a  future  event.  Nonetheless,  SJF  is  asymptotically  optimal 
for  many  possible  distributions  of  arrivals  and  processing  times  [AGRA84]. 

In  a  static  problem  closely  related  to  the  dynamic  one,  all  information  is  avail- 
able at  the  outset,  but  the  jobs  have  "release"  times  indicating  the  earliest  allowable 
start  time  for  each  [DEOG83].  If  all  jobs  have  length  one,  there  is  a  polynomial-time 
solution  for  any  number  of  processors,  as  discovered  by  Lawler  [LAWL64].  Interest- 
ingly, the  same  problem  on  processors  with  differing  speeds  was  still  open  according 
to  the  1981  classification  of  [LAGESlb].  As  mentioned  above,  in  the  preemptive 
case,  SRTF  is  useful  but  not  always  optimal.  Martin- Vega  and  Ratliff  [MART77] 
point  out  that  SRTF  does,  in  fact,  maximize  the  makespan! 

Turning  to  the  central  problem  of  this  chapter,  the  exact  scheduling  problem  to 
be  studied  mvist  be  made  precise.  The  general  assumptions  presented  in  Section  1.1 
hold  here  as  throughout  the  remainder  of  the  dissertation.  The  objective  of  the 
scheduling  is  to  minimize  the  average  turnaround  time  of  a  set  of  independent  jobs, 
or  equivalently,  to  minimize  the  total  flow  time  of  the  set  of  jobs.  Therefore,  the 
problem  appears  to  fall  into  the  class  of  problems  given  in  Section  1.1  as  P/o  /^'Cy, 
where  the  "o"  standing  for  the  empty  precedence  relation. 

The  novelty  here,  however,  is  that  each  job  is  considered  to  consist  of  a  number 
of  non-preemptable  "tasks,"  which  are  related  to  one  another  by  a  precedence  rela- 
tion, called  "— ►."  This  task  structure  not  only  allows  the  job  to  be  preempted 
between  tasks  but  also  to  have  two  or  more  of  its  tasks  run-perhaps  concurrently-on 
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different  processors.  In  other  words,  the  schedulable  units  are  the  tasks  rather  than 
the  jobs.  The  effect  of  the  precedence  relation  is  to  restrict  the  order  in  which  the 
tasks  may  be  executed:  T  —*■  T' ,  or  T  precedes  t'  implies  that  T  must  be  com- 
pletely executed  before  T'  can  start.  T  is  called  an  "immediate  predecessor"  of  T' 
and  T  an  "immediate  successor"  of  T,  written  T  —*■[  T',  if  T  —*T  and  there  is  no 
further  task  T*  such  that  T  T*  T .  A  precedence  relation  is  often  given  by 
the  directed  acyclic  graph  (DAG)  of  the  immediate  successor  relation,  in  which  the 
nodes  are  the  tasks  and  an  arrow  is  drawn  from  T  to  T'  if  and  only  if  T        T . 

As  an  artificial  illustration,  suppose  that  a  job  consists  of  tasks  labeled  T2,  T'3, 
Tg  and  a  precedence  relation  which  satisfies  the  condition  that  T,-  — ►/  Tj  if  and 
only  if  i  divides  j  or  j-'+l.  The  corresponding  DAG  is  then  the  one  shown  in  Figure 
2.1. 


Figure  2.1.   A  Sample  Precedence  DAG 

Notice  that  Tq  but  it  is  not  true  that  — ►/  T^:  hence  no  arrow  is  drawn 

from  T2  to  Tg. 

With  the  introduction  of  the  task  structure  within  each  job,  the  resulting 
scheduling  problem  for  minimizing  the  average  turnaround  time  of  the  jobs  no  longer 
falls  into  the  classification  scheme;  in  fact,  the  problem  does  not  appear  to  have  been 
dealt  with  in  the  literature  before.  We  introduce  here  the  notation 
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P /internal— prec  / ^  Cj  (l) 

to  represent  this  problem  as  an  extension  to  the  notation  of  [LAGESlb]  discussed 
above.  In  order  to  make  it  clear  that  minimizing  the  average  turnaround  time  of  a 
collection  of  such  jobs  is  not  the  same  as  minimizing  the  average  turnarotmd  time  of 
all  of  the  tasks  that  make  up  the  jobs,  consider  the  following  simple  example  to  be 
scheduled  on  two  processors. 

/i  =  ri:l    7-2:4      J2  =  Ts:2    T^2  (2) 

Here,  the  number  following  each  task  is  the  required  processing  time,  and  the 
absence  of  arrows  indicates  that  the  precedence  relations  are  empty.  Even  in  this 
very  simple  case  it  can  be  seen  that  the  Gantt  chart  of  Figure  2.2(a)  gives  an  optimal 
schedule  for  minimizing  the  average  turnaround  time  (ATT)  of  the  tasks  (giving  a 
value  of  three),  whereas  Figure  2.2(b)  obtains  a  better  average  turnaround  time  for 
the  jiaha  (a  value  of  four  as  opposed  to  that  of  4.5  for  Figure  2.2(a)). 


ATT(tasks)=(l+2+3+6)/4  =  3 
ATT( jobs)=(6+3)/2   =  4.5 


ATT(tasks)=(2+2+3+6)/4  =  3.25 
ATT( jobs)=(6+2)/2   =  4 

Figure  2.2.    Job  Versus  Task  Turnaround  Time 

Due  to  the  known  results  about  traditional  scheduling  problems,  it  is  easily  seen 
that  the  static  problem  presented  here  is  NP-hard,  and  this  is  stated  in  the  first 
theorem. 
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THEOREM  2.1:  The  problem  P /internal -prec /j]Cj  is  NP-hard. 

PROOF:  In  the  simple  case  of  a  single  job,  minimizing  the  turnaround  time  is 
the  same  as  minimizing  the  makespan.  A  single  job  in  the  given  problem,  however, 
consists  of  tasks  in  an  arbitrary  precedence  relation,  and  this  problem— even  with  all 
unit-time  tasks— was  shown  by  Ullman  in  1975  [ULLM75]  to  be  NP-hard.  Therefore 
the  given  problem,  being  clearly  more  diflScult  than  a  particular  case,  is  NP-hard.  □ 

For  the  static  problem,  the  following  section  obtains  a  best-case  lower  bound  on 
the  time  required  by  any  schedule  and  obtains  a  possible  worst-case  upper  bound  on 
the  time  required  by  an  extended  version  of  SJF.  For  the  dynamic  (non-preemptive) 
problem,  as  in  the  traditional  case,  there  can  be  no  truly  optimal  algorithm  due  to 
the  random  arrivals,  but  succeeding  sections  present  the  results  of  simulation  work 
comparing  a  variety  of  heuristic  algorithms  intended  to  lower  average  turnaround 
time. 

2.2.  Analvt.ic  Apprnarh  tn  Hpnrist.if  Algorithms 
In  the  case  of  problems  which  are  known  to  be  NP-hard,  the  only  practical 
recourse  is  to  find  suitable  heuristic  scheduling  methods  which  produce  suboptimal 
but  reasonable  schedules.  Many  such  heuristic  methods  have  been  suggested  and 
implemented,  and  many  comparative  studies  of  such  methods  have  been  done,  partic- 
ularly by  industrial  engineers  concerned  with  factory  machine  scheduling  problems 
[RUSS84].  It  has  been  shown  in  simulations,  as  well  as  in  actual  applications,  that 
the  SJF  discipline  outperforms  other  reasonable  heuristics  consistently  in  a  wide 
variety  of  situations  where  it  is  no  longer  optimal  (always  with  the  objective  of 
minimizing  average  turnaround  time)  [BRUN81,  BAKE74a,  CONW67,  DEOG83|. 
Analytic  methods  have  also  been  applied  to  show,  for  example,  that  SJF  has  the 
optimal  expected  turnaround  time  in  a  nondeterministic  system  with  exponentially 
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distributed  arrival  times  and  service  times  [BRUN81].  This  is  an  example  of  "aver- 
age case  analysis." 

With  some  other  scheduling  problems,  such  as  minimizing  the  makespan  of  a 
number  of  independent  tasks,  "worst  case  analysis"  has  been  applied  to  establish 
upper  boimds  on  performance.  The  best  known  such  resvilt  is  that  of  R.  L.  Graham, 
who  showed  that  the  Longest  Job  First  (LJF)  strategy  is  never  worse  than  1.333 
times  that  of  an  optimal  strategy  [GRAH69].  Later  work  has  produced  more  sophis- 
ticated algorithms  with  better  worst-case  bounds,  such  as  the  Multifit  algorithm  of 
Cofi&nan,  Garey,  and  Johnson  [COFF78]. 

In  this  chapter  certain  heuristic  strategies  based  on  the  SJF  and  SRTF  algo- 
rithms are  discussed  and  simulation  is  used  to  compare  them  against  each  other  and 
against  a  "random"  scheduling  strategy.  The  simulator  assumptions  and  program  are 
discussed  and  the  results  analyzed  here.  The  heiu-istic  strategies  tested  are  all  "two- 
level"  strategies  which  use  one  method  to  select  the  job  to  be  run  next  and  another  to 
select  the  task  within  that  job. 

To  begin  with,  an  analysis  is  made  of  an  extended  version  of  the  SJF  algorithm 
as  a  way  of  motivating  its  use  as  a  heuristic  strategy  for  the  problem  in  hand  and  of 
gaining  some  perspective  on  its  efficacy. 

Start  with  a  collection  of  jobs  J^,  J^j  /„,  and  suppose  that  each  job,  con- 
sists of  a  nmnber  of  tasks, 

related  by  a  precedence  relationship  in  the  form  of  a  DAG.  Let      be  the  total  pro- 
cessing time  of  /t;  in  other  words,      is  the  sum  of  the  processing  times  of  the  tasks 
t-  Finally,  assimae  that  the  processing  times  are  in  increasing  order: 

«i  <  «2  <  •    ■  ««• 
The  scheduler  may  only  schedule  a  task  to  be  performed  if  all  its  predecessors  have 
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finished.  The  objective  is  to  minimize  the  total  flow  timp,  or  the  sum  of  the  tur- 
naround times  of  the  jobs  (not  tasks).  We  further  assume  m  identical  processors  P^, 
P2,  Pfn-  As  always,  any  task  may  be  scheduled  on  any  processor.  The  first  algo- 
rithm is  the  classical  SJF,  called  here  Sequential  SJF  for  emphasis. 

Algorithm  2.1.  SS.TF  (spgnpnt.ial  shortest  jnh  first) 

Whenever  a  processor  is  free,  assign  the  next  job  in  numeric  order  (hence  in  size 
order)  to  that  processor  and  rtm  it  to  completion  without  interruption.  K  more  than 
one  processor  is  available  at  the  same  time,  assign  to  the  lowest  numbered  processor 
first. 

Algorithm  2.2.  CS.TF  (mncnrrpnt  shnrt.pst  job  first) 

Whenever  a  processor  is  free,  assign  a  ready  task  from  the  next  job  in  numeric 
order  to  that  processor  and  run  the  task  to  completion.  If  more  than  one  task  from 
the  lowest  numbered  job  is  ready  at  the  same  time,  choose  the  lowest  numbered 
task.  (This  is  referred  to  as  First  Available  Task  (FAT)  in  Section  2.4.)  If  more 
than  one  processor  is  available  at  the  same  time,  assign  to  the  lowest  numbered  pro- 
cessor first. 

LEMMA  2.1:  With  SSJF,  job  with  k  =  qm  +  r  \s  assigned  to  processor  r  and 
scheduled  to  start  at  time 

0  for     ^  <  m 

Sir  =  ,  ,  ~ 

\u,  -I-         -!-■••  «r+{?-i)m       for     k  >  m  . 

PROOF:  For  the  first  m  jobs,  this  is  obvious.  Since  the  jobs  are  listed  in  non- 
decreasing  order,  the  times  that  the  processors  next  become  available  also  form  a 
non-decreasing  sequence  as  the  processor  number  varies  from  1  to  to.  Therefore, 
SSJF  assigns  the  (m-|-l>st  job  to  P^  to  start  when      finishes:  time  u^.  Similarly, 
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the  next  job  is  assigned  to  F2  so  on,  cycling  through  the  processors  in  numeric 
order.  The  start  times  are  calculated  easily  as  the  sum  of  the  processing  times  of  the 
jobs  pre\nousIy  assigned  to  the  corresponding  processors.  □ 

LEMMA  2.2:  (l)  SSJF  and  CSJF  never  leave  a  processor  idle  until  all  jobs  have 
been  schedviled.  More  precisely,  if  a  processor  is  idle  on  some  interval  of  time  [t,  t), 
then  all  jobs  were  scheduled  to  start  no  later  than  time  t. 

(2)  CSJF  never  leaves  a  processor  idle  imless  all  unfinished  jobs  are  rimning.  More 
precisely,  if  a  processor  is  idle  on  some  interval  of  time  [t,t),  then  every  unfinished 
job  has  at  least  one  task  running  on  that  interval  (or  finishes  during  the  interval). 

PROOF:  (1)  This  is  an  easy  consequence  of  the  way  that  the  algorithms 
operate:  whenever  a  processor  becomes  free,  something  is  immediately  scheduled  on 
it  unless  there  is  nothing  left  to  schedule. 

(2)  If  some  unfinished  job  is  not  running,  then  it  must  have  at  least  one  ready 
task.  Therefore  as  soon  as  a  processor  becomes  free,  that  task  (or  some  other)  will 
be  scheduled  on  it;  in  other  words,  any  time  there  is  an  idle  processor  there  can  be  no 
imfinished  jobs  with  no  tasks  running.  □ 

LEMMA  2.3:  Call  a  job  active  at  time  t  if  it  was  started  before  time  t  but  has  not 
finished  by  time  t.  Under  the  CSJF  scheduling  policy,  at  any  time  t  there  can  be  no 
more  than  m  active  jobs. 

PROOF:  Suppose  there  are  m+1  active  jobs  at  time  t.  Let  J  be  the  highest 
numbered  (longest)  of  these  jobs,  and  hence  the  latest  one  to  start.  Let  s(/)  <  t  be 
the  time  it  was  scheduled  to  start.  At  that  time  the  other  m  lower-numbered 
(shorter)  jobs  were  already  active,  so  none  of  them  could  have  had  a  task  ready  at 
time  s{J).  But  the  only  reasons  that  an  unfinished  job  has  no  task  ready  are  that  all 
tasks  are  already  scheduled  or  that  each  unscheduled  task  has  some  predecessor 
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currently  running.  In  either  case,  some  task  of  each  of  the  unfinished  jobs  must  have 
been  nmning  at  time  «(/).  This  is  a  contradiction  to  the  fact  that  there  are  only  m 
processors  and  there  were  at  least  m+1  jobs  (counting  job  /)  running  at  time  s{J). 
Therefore  no  such  set  of  more  than  m  jobs  can  exist.  □ 

LEMMA  2.4:  At  any  time  t  before  all  jobs  are  completed  and  for  any  k  between 
one  and  the  minimum  of  m  and  the  number  of  unfinished  jobs,  under  the  CSJF 
scheduling  policy  there  must  be  k  processors  busy  running  only  the  k  lowest  num- 
bered (shortest)  unfinished  jobs. 

PROOF:  The  lowest  numbered  unfinished  job  has  the  highest  priority.  There- 
fore once  it  gains  this  status,  no  other  job  can  preempt  it  and  it  will  always  be  run- 
ning on  some  processor.  (Whenever  one  of  its  tasks  finishes  either  another  will  be 
ready  or  another  will  still  be  running  on  another  processor.)  Therefore  the  lemma  is 
true  for  ^=1.  Assume  that  it  is  true  for  k—1,  for  some  1  <k<m,  and  that  there  are 
at  least  k  unfinished  jobs  at  time  t. 

By  the  induction  hypothesis  there  are  k—1  processors  which  are  running  only 
the  k—1  lowest  numbered  unfinished  jobs  at  time  t.  Let  J  be  the  k-th  lowest  num- 
bered unfinished  job  at  that  time.  If  /  is  also  running  on  some  processor  at  time  t , 
then  the  lemma  is  true  for  case  k.  If,  on  the  other  hand,  /  has  not  yet  been  started, 
then  the  only  jobs  nmning  on  all  m  processors  must  be  lower  numbered  than  J. 
Since  CSJF  does  not  leave  processors  idle  as  long  as  there  are  jobs  that  have  not  yet 
started  (Lemma  2.2),  there  are  m  >k  processors  running  the  first  k-1  unfinished 
jobs.  Finally,  if  /  has  already  been  started,  but  J  is  not  running  at  time  t,\et  t^he 
the  last  time  before  t  that  a  task  of  /  was  completed.  By  the  induction  hypothesis, 
at  any  time  before  there  must  be  k—1  processors  running  the  k—1  lowest  num- 
bered unfinished  jobs.  Since  from  to  <  no  task  of  /  is  running  even  though  /  is 
unfinished  at  this  time,  any  processor  that  became  free  must  have  been  occupied  by  a 
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task  from  a  job  of  lower  number.  Moreover,  no  processor  can  be  left  idle  during  this 
interval  since  at  least  job  /  has  a  ready  task  at  this  time.  In  particular,  the  proces- 
sor rxmning  /  and  the  other  Ar— 1  processors  running  lower  numbered  jobs  must  all 
continue  to  run  jobs  mmabered  lower  than  J  from  t^to  t.  At  time  t,  the  same  argu- 
ment prevails  since  there  are  still  at  least  k  unfinished  jobs  and  /  is  not  running. 
Therefore  the  lemma  is  true  for  k,  and  by  induction  it  is  true  for  ail  specified  cases. 
□ 

LEMMA  2.5:  The  total  flow  time  of  the  first  k  jobs  under  CSJF  is  no  more  than  the 
total  flow  time  of  the  first  k  jobs  under  SSJF  for  all    =  1,  2,  n. 

PROOF:  The  flow  time  (or  turnaround  time),  !t{J)  of  a  particular  job  /  is 
given  by  ft{J)  =  s{J)  -f  (time  that  J  is  running)  +  (time  that  /  is  active  but  not 
running).  We  argue  that  for  each  of  these  three  terms,  if  J  is  doing  worse  than  it 
would  under  SSJF,  then  some  other  lower  numbered  jobs  are  doing  better. 

Under  SSJF,  the  time  that  /  is  running  must  be  just  u{J),  the  total  processing 
time,  while  xmder  any  policy  it  cannot  be  more.  Therefore,  CSJF  does  as  well  or 
better  on  this  term. 

Under  SSJF,  the  job  /  is  never  active  and  not  running  since  each  job  is  run 
without  interruption  once  started.  If,  under  CSJF,  an  active  job  /  is  not  running  on 
some  interval  [t,  t),  then  by  Lemma  2.4,  there  must  be  k  processors  busy  running 
the  k  lowest  numbered  jobs,  where  /  is  the  ^-th  lowest.  But  since  J  itself  is  not 
running,  the  k  processors  are  nmning  only  k—1  jobs.  Consequently,  at  any  given 
time  in  the  interval,  one  of  those  lower-numbered  jobs  must  be  running  on  two  pro- 
cessors. This  means  that  while  ft{J)  is  lengthened  by  t'—t,  the  turnaround  times  of 
some  other  lower-numbered  jobs  must  be  shortened  by  a  total  amount  of  at  least 
t'—t.  Therefore,  CSJF  is  doing  at  least  as  well  as  SSJF. 
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Now  let  S{J)  be  the  start  time  of  /  under  SSJF  and  suppose  that  /  is  the 
lowest  numbered  job  such  that  s{J)  >  S{J).  Then  up  to  time  s{J)  at  least  job  / 
had  not  started,  so  by  Lemma  2.2  no  processors  were  idle  before  this  time;  hence,  the 
system  under  CSJF  has  dedicated 

m  X  8{J)  (3) 

units  of  processing  time  to  running  the  jobs  numbered  lower  than  job  J  by  time 
s{J).  On  the  other  hand,  after  time  S{J),  the  policy  SSJF  is  \ising  one  processor  to 
run  job  /  and  hence  is  dedicating  at  most 

m  X  S{J)  +  (m-1)  X  {s{J)  -  S{J))  =  S{J)  +  (m-1)  X  s{J)  (4) 

units  of  processing  time  to  these  jobs  by  time  s{J).  In  other  words,  CSJF  put  in 
s{J)  —  S{J)  extra  units  of  processing  time  running  the  lower-numbered  jobs  before 
starting  Let  /  be  a  lower  nimibered  job  which  received,  say,  dt  more  imits  of 
processing  under  CSJF  than  under  SSJF  by  time  s{J).  Then  at  time  s{J),  f  has  dt 
less  time  to  go  under  CSJF  unless  it  is  delayed  after  time  s{J).  As  observed  in  the 
foregoing  paragraph  of  this  proof,  however,  such  a  delay  cannot  cause  any  increase  in 
the  total  active  time  of  the  jobs  up  to  and  including  job  /.  Moreover,  by  the 
minimality  of  /  ,  all  of  the  jobs  up  through  /  start  at  least  as  early  under  CSJF  as 
under  SSJF.  Therefore,  the  total  flow  time  of  the  jobs  up  through  /  is  at  least  dt 
less  under  CSJF.  Since  a  total  of  s{J)  —  S{J)  extra  time  units  were  received  by  the 
jobs  before  /,  the  total  flow  time  under  CSJF  of  the  jobs  before  J  is  at  least 
s{J)  —S{J)  less  than  under  SSJF,  and  therefore  the  total  flow  time  including  job  / 
is  no  more  than  that  under  SSJF.  □ 

THEOREM  2.2:  The  total  flow  time  achieved  by  CSJF  is  at  least  as  small  as  that 
achieved  by  SSJF. 
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PROOF:  This  is  an  immediate  consequence  of  Lemma  2.5  taking  k  =  n.O 

Tm-ning  now  to  obtaining  a  worst  case  bomid  for  CSJF,  Theorem  2.2  assures 
that  the  flow  time  of  SSJF  can  be  used.  The  following  well-known  result  [BAKE74a] 
indicates  that  this  flow  time  is  easily  calculated. 

LEMMA  2.6:  The  total  flow  time  of  n  jobs  rim  on  m  processors  under  SSJF  is  given 
by 


FT  =  X! 


t-i 


n-t+l 


m 


(5) 


where  the  symbol  joj  means  the  least  integer  greater  or  equal  to  a . 


PROOF: 


i-l 
n 

=  XJ  E         "A        {by  Lemma  2.1}. 

1-1  (mod  m),  0<i<» 


Each  appears  in  this  expression  as  many  times  as  there  are  integers  between  k 
and  n   which  are  congruent  to  k  modulo  m.    These  are  just  the  numbers 


k,  k+m,  k+2m,      and  there  are  In—k+l)/^ 


fm 


of  them,  proving  the  lemma.  □ 


COROLLARY  TO  THEOREM  2.2:  The  total  flow  time  of  n  jobs  run  on  m  proces- 
sors under  CSJF  is  never  more  than 


1-1 


n—i+1 


m 


THEOREM  2.3:  Let  ft  be  the  flow  time  for  n  jobs  on  m  processors  under  CSJF 
and  let  ft^^i  be  the  optimal  flow  time  for  the  same  set  of  jobs  under  the  optimal 
schedule.  Then 
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JL 


<1  + 


n 

1-1 


opt 


1-1 


(6) 


where  p,-  =  m  -  (the  remainder  of  dividing  n  —  t '  + 1  by  m )  if  this  remainder  is 
non-zero,  and  p,-  =  0  if  the  remainder  is  0. 

PROOF:  The  best  possible  case  for  scheduling  n  jobs  on  m  processors  occurs 
when  each  job  consists  of  m  equal-sized  tasks  of  length  «,/m  which  can  be 
scheduled  to  run  concurrently.  Then  the  optimal  flow  time  would  be 
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ftnni  opt  =  X!i^-i+l)Ui/m. 


(7) 


1-1 


In  general,  then,        ^pf  <  ftgpt  ^ft<  FT,  the  flow  time  under  SSJF,  and 

j;[n-,+i)]».. 


JL 


ft. 


opt       ftuniopt  ^^n-i+l)ui/m 
1-1 


< 


1-1 


(8) 


Observe  that  if 


a  =mq  +  r  with  r  <  a, 


then 


m 


1^  +  r/mj  = 


q  +1 


^    if    r  =  0 
otherwise. 


Therefore,    mja/m|  =  m9  or  mq  +Tn,  which  is  the  same  as  a  (if  r  =0)  or 

a  +m  —  r  (if  r  >0).  By  multiplying  numerator  and  denominator  of  (8)  by  m  and 
applying  the  preceding  observation  with  a  =  n  —  i  +  I, 
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yL<i=i_  — .  ° 

«-i  1-1 

A  few  comments  are  in  order  to  interpret  this  result.  If  all  the  w,-  are  equal, 
then  the  CSJF  picks  the  next  job  more  or  less  arbitrarily;  therefore,  how  well  it  per- 
forms depends  on  how  it  schedules  the  tasks  of  each  job.  In  this  case  the  inequality 
of  Theorem  2.3  becomes 

ft  EPi  _D>i 

If  n  is  an  exact  multiple  of  m ,  say  n  =rm ,  then  this  becomes 

n(n -l-l)/2  ^  ' 

_  rm{m—l) 
~  n(n+l) 

(n+1) 

If  no  assumption  is  made  on  the  sizes  of  the  jobs,  then  it  is  still  possible  to  write 
ft  D>i% 

=  l-Km-lK/(n+l)ui.  (10) 


If  at  least 


u 


n 


< 


(n+1) 


«i  (m-1)' 

then  ft/ftgj,f  <2,  and  the  ratio  tends  to  1  if  n  gets  much  larger  than  m  while  «„/«i 
remains  constant.  Looking  at  it  another  way: 


28 


fhpt  min{i}Xi>, 


=  m. 


(11) 


Therefore,  combining  (10)  and  (11)  gives 


COROLLARY  TO  THEOREM  2.3: 


<  1+  (m-1)  X  min{  1, 


u. 


n 


}• 


J^opt 


(n+l)«i 


2.3.  Miiltiprncessor  Simnlator 


This  section  describes  the  simulator  developed  in  order  to  compare  a  number  of 
different  heuristic  scheduling  strategies  in  an  environment  such  as  that  described  in 
Section  2.1.  The  simulator  consists  mainly  of  a  driver  program  which  acts  like  a 
multiprocessor  system  of  hardware  and  appropriate  interrupts,  a  high-level  scheduler 
which  determines  admission  into  the  system  of  new  jobs,  and  a  nmnber  of  inter- 
changeable dispatchers  which  embody  the  different  scheduling  strategies.  Whereas 
the  high-level  scheduler  reads  job  information  from  a  job  file  and  initializes  the 
appropriate  data  structures  containing  the  necessary  job  information,  the  dispatcher 
is  capable  of  scanning  the  job  "queue"  and  selecting  the  appropriate  task  according  to 
the  given  discipline.  The  dispatcher  also  updates  job  and  task  information  and 
informs  the  high-level  scheduler  when  a  job  is  completed.  Two  undergraduate  stu- 
dents at  the  University  of  Florida,  Dennis  Suppe  and  Borden  Wilson,  assisted  in  pro- 
gramming the  simulator  [SUPP86,  WILS86].  The  actual  Pascal  code  appears  in 
[WILS86]. 

There  are  a  number  of  design  decisions  which  critically  affect  the  results  of  the 
simulations.  First,  it  is  assumed  that  the  scheduling  itself  contributes  no  overhead. 
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Thus  the  system  maintains  a  "clock"  for  each  processor  which  is  simply  updated  by 
the  length  of  each  task  scheduled  on  that  processor.  The  turnaround  time,  or  flow 
time,  of  each  job  is  then  calculated  as  "time  completed  -  time  entered  system." 
Second,  the  high-level  scheduler  maintains  a  constant  degree  of  multiprogramming 
(DM)  as  long  as  there  are  more  jobs  in  the  input  file.  This  means  that  at  the  start  of 
the  simulation  a  value  for  the  DM  is  chosen  and  the  simulator  enters  DM  jobs  into 
the  system  at  time  zero.  Henceforth,  whenever  a  job  is  finished,  the  scheduler  reads 
a  new  job  with  starting  time  equal  to  the  finish  time  of  the  job  just  terminated. 
Once  the  file  is  emptied  of  jobs,  the  simulation  continues  until  all  the  jobs  remaining 
in  the  system  are  completed. 

A  third  element  of  the  design  of  the  simulator  is  that  at  all  times  the  scheduler 
has  at  its  disposal  all  the  unscheduled  tasks  of  all  the  jobs  in  the  sjrstem,  together 
with  the  necessary  information  to  implement  the  algorithms  described  below.  In  par- 
ticular, the  scheduler  must  know  the  total  processing  time  (TPT  =  «(/))  of  each  job, 
the  processing  time  of  each  remaining  task,  the  precedence  relations  among  the  tasks, 
which  tasks  belong  to  which  jobs,  and,  for  some  of  the  algorithms,  additional  infor- 
mation. All  of  this  information  is  stored  within  a  number  of  two-dimensional 
matrices— one  for  each  job  in  the  system.  The  matrix  is  essentially  an  adjacency 
matrix  for  the  DAG  representing  the  precedence  relation  of  the  job,  where  the  {i,  j) 
entry  is  one  if  task  T,-  is  an  immediate  predecessor  of  task  Tj  and  is  zero  otherwise. 
This  matrix  is  modified,  however,  in  a  number  of  ways.  The  diagonal—or  {i,  i)~ 
entries  are  set  equal  to  the  processing  times  of  the  corresponding  tasks.  The  tasks 
are  always  numbered  such  that  if  T,-  precedes  Tj,  then  i  <  j.  This  guarantees  that 
no  entries  of  1  will  appear  below  the  diagonal,  and  therefore  these  entries  can  be 
used  to  store  other  information  about  the  job.  Finally,  the  matrix  is  actually  aug- 
mented by  an  extra  row  and  column,  so  that  if  there  are  (at  most)  ten  tasks,  then  the 
matrix  has  subscripts  rimning  from  zero  to  eleven.  A  one  in  the  (0,  j)  entry,  for 
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example,  indicates  that  task  j  is  available  for  scheduling  (not  yet  scheduled  but  no 
unscheduled  predecessors). 

Besides  the  simulator  program  itself,  two  other  important  auxiliary  programs 
have  been  developed:  a  job  pool  generator  which  constructs  files  of  jobs  with  various 
characteristics  and  a  statistical  analysis  package  which  analyzes  the  output  of  the 
simulator  to  give  information  on  the  relative  performance  of  the  various  dispatchers. 

2.4.  Simulation  Results 

The  class  of  algorithms  being  simulated  can  be  described  as  "bilevel."  These 
algorithms  use  one  criterion  for  the  selection  of  the  next  job  to  be  scheduled  and  a 
different  criterion  for  the  selection  of  the  specific  task  within  the  chosen  job.  This 
general  strategy  results  from  the  necessity  to  order  the  execution  of  whole  jobs  while 
at  the  same  time  selecting  a  distribution  of  the  tasks  within  each  job  to  complete  the 
chosen  jobs  as  quickly  as  possible.  Specific  algorithms  are  created  by  combining  one 
of  the  "job-level"  strategies  with  one  of  the  "task-level"  strategies. 

All  of  the  '  mtelligent"  strategies  selected  for  testing  are  based  on  the  intuitive 
idea  of  running  the  smallest  jobs  first  and  doing  so  as  quickly  as  possible.  Two 
related  but  different  measures  of  the  "smallness"  of  the  job  are  used:  the  total  pro- 
cessing time  (TPT)  of  the  job  and  the  critical  path  time  (CPT)  of  the  job.  The  TPT 
of  a  job  has  been  represented  by  w,-  in  this  chapter,  whereas  CPT  is  the  time 
required  to  execute  the  longest  chain  of  tasks  in  the  job.  Efiectively,  CPT  gives  a 
minimum  time  required  to  rim  the  job  on  any  number  of  processors,  while  TPT  is 
the  time  the  job  would  take  to  run  on  one  processor  without  interruptions.  Once  a 
job  is  chosen  according  to  one  of  these  criteria,  a  simple  method  of  choosing  an 
appropriate  task  is  to  select  that  task  with  the  most  immediate  successors.  This 
tends  to  provide  as  many  tasks  as  possible  available  at  any  given  moment  and 
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therefore  allow  as  many  processors  as  possible  to  cooperate  in  finishing  the  job 
qmckly.  Another  approach  is  to  run  the  task  heading  the  longest  remaining  chain  of 
imscheduled  tasks.  Combining  these  methods  with  other  related  ones  and  some 
"unintelligent"  ones  produces  the  following  list  of  possiblities. 

1.  Job  Level 

a.  SJF        Select  first  the  job  with  the  shortest  total  processing  time  (the  siun  of 

all  the  task  processing  times). 

b.  SCPF      Select  first  the  job  with  the  shortest  critical  path  (length  of  the  longest 

chain  of  tasks). 

c.  SRTF      Select  first  the  job  with  the  shortest  remaining  processing  time. 

d.  SRCPF    Select  first  the  job  with  the  shortest  remaining  critical  path. 

e.  Random  Select  a  random  job. 

2.  Task  Level 

a.  MISF       Select  first  the  task  with  the  most  immediate  successors. 

b.  LRTF     Select  the  task  heading  the  longest  sequential  chain  of  remaining  tasks. 

c.  FAT        Select  the  first  (lowest  numbered)  available  task. 

Just  how  "good"  each  of  these  methods  might  prove  to  be  appears  to  be  depen- 
dent on  some  of  the  characteristics  of  the  DAGs  of  the  typical  jobs  being  scheduled. 
For  example,  jobs  which  consist  of  a  large  number  of  small,  independent  tasks  may 
take  a  long  time  to  complete  even  though  they  have  very  short  CPTs.  Conversely, 
jobs  which  are  almost  entirely  sequential  may  take  longer  to  finish  than  others  with 
larger  TPTs  but  exhibiting  more  concurrency.  It  was  therefore  decided  to  run  the 
simulations  on  three  different  type  of  job  pools,  all  with  approximately  the  same 
average  number  of  tasks:  'Wide  Jobs,"  with  small  CPT,  "Long  Jobs,"  with  most 
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tasks  lying  along  the  critical  path,  and  'TRandom  Jobs,"  with  DAGs  generated  ran- 
domly. 

Before  running  the  simulations,  a  nimiber  of  hypotheses  were  made  about  the 
expected  effects  of  the  different  parameters  on  the  turnaround  times  expected.  These 
were  as  follows: 

Hi.  As  the  number  of  processors  increases,  the  differences  between  the  scheduling 
policies  will  decrease,  since  if  there  are  enough  processors,  no  ready  task  has  to 
wait  and  all  reasonable  scheduling  policies  produce  the  same  results. 

H2.  With  higher  degrees  of  multiprogramming,  the  differences  between  the  policies 
would  be  more  apparent  since  at  each  moment  the  dispatcher  has  more  jobs  to 
choose  from  and  hence  the  choice  is  more  critical. 

H3.  The  SJF  and  SRTF  strategies  should,  in  general,  outperform  the  SCPF  and 
SRCPF  with  the  "wide"  jobs,  because  when  there  are  many  tasks  but  short  criti- 
cal paths,  the  length  of  the  critical  path  is  a  poor  estimate  of  the  time  required 
to  nm  the  job.  This  should  be  all  the  more  true  when  there  are  few  processors. 

H4.  The  SCPF  and  SRCPF  strategies  should,  in  general,  outperform  the  SJF  and 
SRTF  methods  with  the  'long"  jobs,  because  when  a  large  number  of  the  tasks 
lie  along  the  critical  path,  the  length  of  this  path  becomes  the  determining  fac- 
tor in  how  long  it  will  take  to  finish  the  job.  This  should  be  even  more  pro- 
nounced when  there  are  many  processors. 

H5.  The  Random  job-level  strategy  should  do  markedly  worse  than  any  of  the  other 
methods,  except  when  there  are  many  processors  and  a  low  degree  of  multipro- 
gramming. 

The  simulator  was  run  on  the  VAX  11/780  system  of  the  University  of  Florida 
CIRCA  system  with  job  files  of  approximately  500  jobs.  Each  run  matched  a  job  file 
of  given  characteristics  against  a  dispatcher  using  a  certain  strategy  and  a  high-level 
scheduler  maintaining  a  particular  level  of  jobs  in  the  system.  Moreover,  this  was 
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done  for  several  different  numbers,  m,  of  processors.  As  can  be  seen  in  Table  2.1,  for 
the  values  chosen  there  are  840  possible  different  simulation  "settings."  Data  were 
actually  collected  on  all  but  a  few  of  these  possibilities. 

TABLE  2.1.  Factors  Considered  in  Simulation 

1.  Degree  of  Multiprog.:  5,  10,  20,  30 

2.  Job  Types:  Wide,  Long,  Random 

3.  Number  of  Processors:  3,  5,  7,  9,  11,  13,  15 

4.  Dispatcher 

Job  Level:  SJF,  SRTF,  SCPF,  SRCPF,  Random 

Task  Level:  MSF,  FAT 

The  actual  simulation  results  appear  in  Wilson's  work  [WILS86].  Two  kinds  of 
statistical  tests  were  applied  to  the  data  in  order  to  check  the  significance  of  the 
results.  The  first  performed  tests  on  the  hypothesis  of  the  form  Hq.  fj^  =  /j^  against 
the  alternative  H^:  fx^  >  fj^,  where  the  represent  the  average  turnaround  times  of 
matched  runs  (equal  levels  of  multiprogramming,  numbers  of  processors,  and  job 
characteristics).  This  is  a  standard  test  of  hypotheses  for  the  equality  of  the  means 
of  two  populations  and  is  based  on  the  use  of  a  table  of  probabilities  for  the  standard 
normal  distribution  (^-values).  The  sample  variances  are  used  to  estimate  the  popula- 
tion variance.  The  details  are  reported  in  Ellis  [ELLI86].  Appendix  B  shows  the 
cases  in  which  the  alternative  hypothesis  of  the  form  "Strategy  A  is  better  than  Stra- 
tegy B"  could  be  accepted  at  the  90%  confidence  level.  These  tests  of  hypothesis 
were  performed  on  the  raw  data  consisting  of  all  the  individual  turnaroimd  times, 
and  then  these  data  were  discarded  [ELLI86].  (In  total  it  comprised  over  two  mega- 
bytes of  storage.) 

The  second  analysis  was  a  post-facto  application  of  the  Friedman  Two-way 
Analysis  of  Variance  by  Ranks  [SIEG56|.  This  is  a  non-parametric  test  which  was 
carried  out  for  each  fixed  degree  of  multiprogramming  and  fixed  number  of  proces- 
sors.  It  tests  the  null  hypothesis  that  all  the  five  job-level  scheduling  methods 
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produce  the  same  average  turnaround  times  against  the  alternative  hypothesis  that 
there  is  a  significant  difference  among  these  times.  There  are  several  reasons  for  the 
application  of  this  procedure: 

1.  The  tests  of  hypothesis  carried  out  comparing  two  average  turnaround  times 
were  done  using  the  standard  techniques  and  the  z  values  of  the  standard  nor- 
mal distribution.  Since  the  sample  sizes  were  large  (500),  such  tests  should  be 
relatively  reliable,  but  two  requirements  were  not  satisfied:  the  standard  devia- 
tions of  the  populations  being  compared  were  not  at  all  equal,  particularly  when 
comparing  the  Random  scheduler  with  one  of  the  more  intelligent  ones;  and  the 
samples  were  not  independent,  since  the  different  schedulers  were  rxm  against 
the  same  input  data.  Thus  further  testing  was  required. 

2.  Since  the  raw  data  were  not  available,  the  test  had  to  be  run  using  each  average 
turnaround  time  result  as  a  single  data  item.  Although  such  sample  averages 
drawn  from  a  given  population  are  guaranteed  by  the  Central  Limit  Theorem  to 
have  a  normal  distribution,  averages  corresponding  to  different  settings  of  the 
independent  variables  have  very  different  standard  deviations.  Moreover,  unless 
data  resvilting  from  many  different  settings  are  lumped  together,  the  sample  sizes 
for  further  testing  are  quite  small. 

3.  The  Friedman  Analysis  of  Variance  method  applies  to 

(a)  data  classified  by  rank  only, 

(b)  data  of  imknown  distribution  and  standard  deviation, 

(c)  dependent  (matched)  samples,  and 

(d)  testing  for  the  equivalence  of  a  number  of  means  at  the  same  time. 
The  approach  chosen  was  then  the  following: 

•    A  fixed  value  of  m  (processors)  and  DM  (degree  of  multiprogramming)  are 
chosen. 
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•  The  five  average  turnaround  times  obtained  for  the  five  different  job-level 
scheduling  methods  with  a  fixed  job  type  are  treated  as  a  matched  set  of  data. 
The  three  sets-one  for  each  job  type— give  five  dependent  samples  of  three 
values  each  {k  =  5,  TV  =  3)  to  which  to  apply  the  test. 

•  The  Friedman  test  is  a  rank  test,  so  the  five  matched  values  are  replaced  by 
their  ranks-first  through  fifth— and  the  Xr  statistic  is  computed,  where 

with  =  the  sum  of  the  ranks  of  the  t-th  matched  set.  The  Xr  value  is  then 
compared  with  the  value  for  the  .10  level  of  significance  (for  k—1  =4  degrees 
of  freedom):      =  7.78. 

•  This  test  is  repeated  for  each  of  the  seven  values  of  m  and  four  values  of  DM. 
This  is  all  repeated  for  the  two  difierent  task-level  scheduling  methods-MSF 
and  FAT.  The  results  showing  significance  at  the  .10  level  appear  in  Appendix 
A. 

Unfortimately,  one  of  the  most  remarkable  results  is  the  generally  small  and 
unpredictable  differences  among  the  various  strategies.  Since  each  simulation  run 
involved  a  large  nimiber  of  (over  500)  simulated  jobs  being  run,  it  was  expected  that 
there  would  be  relatively  clear  "visual"  differences  in  the  results  with  the  different 
scheduling  policies.  These  differences  did  not  materialize.  Combining  the  results  of 
the  two  methods  of  analysis  yields  the  following  conclusions: 

1.  Under  the  chosen  design  criteria  and  with  the  relatively  small  jobs  (no  more 
than  ten  tasks  per  job)  used  as  data,  the  turnaround  times  are  only  marginally 
dependent  on  the  scheduling  strategy  used. 

2.  Significant  differences  are  more  apparent  with  a  low  (5)  degree  of  multiprogram- 
ming and  with  a  high  (  >  9  )  number  of  processors,  contrary  to  the  expecta- 
tions HI  and  H2  above. 
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3.  The  only  reasonably  consistent  finding  was  that  the  critical  path  methods  out- 
perform the  shortest  job  methods  (SJF  and  SRTF),  particularly  at  small  DMs 
and  using  the  First  Available  Task  task  scheduler. 

4.  Due  to  combining  of  the  data  from  the  different  job  types  in  applying  the  Fried- 
man test,  no  evidence  for  or  against  the  hypotheses  H3  and  H4  above  can  be 
derived  from  that  test.  Notwithstanding,  relatively  strong  support  was  found  for 
H4  from  the  tests  of  hypothesis  on  two  means.  With  long  jobs,  the  critical  path 
methods  outperformed  the  other  methods  tested  at  a  low  degree  of  multipro- 
gramming and  with  a  middle  to  high  number  of  processors.  These  results  were 
found  using  the  unintelligent  task  scheduler,  FAT,  and  are  corroborated  by  the 
Friedman  Test. 

In  order  to  understand  the  low  power  of  discrimination  of  these  results,  it  is 
necessary  to  investigate  the  effect  of  the  experimental  design  on  the  results  obtained. 
First  consider  the  effect  of  maintaining  a  constant  degree  of  multiprogramming  on 
the  flow  time.  The  flow  time,  ft,  can  be  calculated  as  the  sum  of  the  individual 
finish  times,  /,-,  of  the  jobs,  but  it  can  also  be  calculated  as  the  sum  of  the  degree 
of  multiprogramming  (DM)  times  the  time  interval  on  which  that  degree  is  valid: 

If  DM  is,  in  fact,  constant,  this  just  becomes  ft  =  DM  X  total  time,  independent  of 
scheduling  policy!  At  the  end  of  each  of  the  simulation  runs,  the  remaining  jobs  are 
actually  finished,  dropping  DM  to  zero.  Thus,  for  example,  if  504  jobs  are  run  on  3 
processors  with  5  jobs  in  the  system,  then  DM  =  5  until  500  jobs  have  been  run  and 
then  it  drops  to  4,  then  3  and  so  on.  This  means  that  the  flow  time  in  this  example 
is 

ft=6X  (makespan  of  the  first  500  jobs  completed)-}- 

4  X(/501  -/500)+   •  •  •    +  1  X(/ 504  -/503)- 
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Now  any  scheduling  method  that  does  not  leave  a  processor  idle  unnecessarily  can 
achieve  no  shorter  makespan  than  (sirni  of  500  smallest  and  no  longer  mak- 

espan  than  (  sum  of  499  largest  ti,)/3  +  largest  uj.  The  difference  between  these 
two  is  just 

largest  Uj  +  sum  of  (3  next  largest     -  4  smallest  «,  )/3. 
The  rest  of  the  terms  in  ft  comes  to,  at  most,  10/3  times  the  longest  job  processing 
time,  Uj.  All  together, 

fl  -  flopt  <  13/3  X  longestuy 

+  sum  of  (3  next  largest  u,-  —  4  smallest  «,  )/^. 

For  500  jobs,  this  would  amount  to  something  like  a  2%  difference  between  the 
observed  flow  time  and  the  optimal  value  and  hence  even  less  between  two  observed 
values. 

In  general,  from  the  foregoing  discussion  it  can  be  seen  that  the  only  effect  that 
rtmning  a  large  sample  of  jobs  (such  as  500  jobs)  has  on  the  given  bound  on 
~  f^opt  is  that  the  longest  jobs  may  be  longer  and  the  shortest  jobs  shorter  than 
would  be  the  case  with  a  small  sample.  This  would  not  be  the  case  if  random  arrival 
times  were  used  in  the  simulation,  since  a  variable  degree  of  multiprogramming 
would  be  produced  and  hence,  presumably,  greater  variability  in  the  total  flow  times 
would  be  achieved  by  the  different  scheduling  algorithms. 

Another  factor  contributing  to  the  homogeneity  of  the  results  might  have  been 
the  way  in  which  the  data  files  were  produced.  First,  12  DAGs  were  created  with 
the  desired  characteristics  (long,  wide,  or  randomly  generated).  Then  514  jobs  were 
created  by  assigning  exponentially  distributed  random  processing  times  to  the  tasks 
of  the  DAGs,  using  the  same  12  graphs  repeatedly,  each  time  with  different  process- 
ing times.  Whereas  the  task  times  are  exponentially  distributed  with  mean  and  stan- 
dard deviation  of  one,  the  processing  times,  u{J),  jobs  with,  say,  ten  tasks  will  be 
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approximately  normally  distributed  with  a  mean  of  ten  and  a  standard  deviation  of 
1/  ^(10),  or  approximately  .32.  In  other  words,  there  is  relatively  little  variation  in 
the  job  sizes  and  hence  less  chance  for  the  different  policies  to  exhibit  their  powers. 

Towards  a  Theory  of  Program  Si7e 

It  is  evident  from  looking  at  the  scheduling  strategies  given  in  the  previous  sec- 
tion that  the  central  idea  to  all  of  them  (except  the  random  strategy)  is  to  select  first 
the  "smallest"  job,  in  some  sense  of  the  word,  and  then  to  run  that  job  as  quickly  as 
possible.  The  idea  of  "run  as  many  of  the  jobs  as  quickly  as  possible"  has  strong 
appeal  and  is  known  in  its  guise  of  SJF  to  be  optimal  for  non-preemptive  static 
scheduling  of  nondecomposable  jobs  (jobs  consisting  of  a  single  task).  Nonetheless,  it 
is  not  easy  to  specify  exactly  what  is  a  "small"  job  when  extending  this  idea  to  jobs 
which  consist  of  a  collection  of  precedence-related  tasks  which  are  to  be  run  on 
several  processors.  Although  the  critical  path  time  (CPT)  and  the  total  processing 
time  (TFT)  are  well-worn  measures  of  job  size,  neither  one  always  tells  us  which  job 
can  be  finished  more  quickly;  however,  we  may  conclude  that  the  job's  run  time  will 
be  greater  or  equal  to  S  =  max{  CPT,  TPT/m  }  if  there  are  m  processors.  While 
this  suggests  using  S  as  the  measure  of  size,  examples  can  be  found  to  show  that 
smallest  "S"  first  is  not  an  optimal  strategy  either. 

One  method  to  improving  the  scheduling  strategy  may  be  to  design  an  easily 
calculable  measure  of  job  "size"  which  more  closely  identifies  how  long  it  should  take 
to  run  the  job  on  m  identical  processors.  Such  a  measure  can  be  obtained  by  extend- 
ing the  results  established  by  Hu  in  his  1961  paper  [HU  61],  where  the  attention  was 
restricted  to  unit-time  tasks  in  an  in-tree  precedence  graph.  Following  his  ideas,  we 
make  the  following  definitions: 
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(1)  Assign  a  height,  h{T),  to  each  task  T  in  the  precedence  graph  of  a  job  /  by- 
setting 

a)  h{T)  =  u{T)  (the  processing  time  of  T)  if  T  has  no  successors,  or 

b)  h{T)  =  u{T)  +  max{  A(T')  :  T'  a  successor  of  T  }  if  all  successors  of  T 
have  a  height  assigned. 

(2)  Assign  a  depth,  d{T),  to  each  task  T  in  exactly  the  same  way  as  A(r)  was 
defined,  but  substituting  "predecessor"  for  "successor"  in  all  places  in  the 
definition. 

(3)  Let  e  =  min{  «(r) :  all  T  in  /  }. 

(4)  For  each  natural  number  j,  let  Qj  =  {T  :  h{T)  -  u{T)>j  X  e  }. 

(5)  Likewise  define      =  {  T  :  d{T)  -  u{T)  >  j  X  e  }. 

(6)  Let  CPT  =  critical  path  time  =  max  {  h{T)  :  oW  T  m  J  } 
=  max  {  d{T) :  all  T  in  /  }. 

(7)  For  any  set  C  of  tasks,  let  TPT{C)  =  total  processing  time  of  C  =  }. 

TtC 

(8)  Define 

Size(/)=max{  CPT,  max{jXe+TPT{Qj)/m:0<j<\cPT 
max{iXe  +TPT[R  [j])/m  :0<j<  CPT /e]}  }. 

The  quantity  Sizt[J)  so  defined  is  readily  calculated  in  time  0{n),  where  n  is 
the  number  of  tasks,  and  so  can  be  efiiciently  used  for  setting  job  priorities  in  a 
scheduling  algorithm.  Size{J)  gives  a  reasonable  lower  bound  on  the  time  to  com- 
plete /  under  any  schedule  of  /  on  m  processors;  hence  selecting  first  the  job  /  with 
smallest  Size{J)  would  be  a  reasonable  strategy.  In  order  to  give  more  concrete  sup- 
port to  this  statement,  we  turn  to  Hu's  original  work  [HU  61]  in  which  he  shows 
that,  when  all  tasks  are  unit  length  and  the  precedence  relation  forms  an  in-tree, 
Size{J)  is  the  minimiun  time  required  to  execute  J. 
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PROPOSITION  2.1:  If  /  is  a  collection  of  unit-time  tasks,  Ty,  T'2,  T„,  related 
by  an  in-tree  precedence  relation,  then  h{T)  =  the  number  of  tasks  in  the  chain  from 

T  to  the  root,  Qj  =  {T:  h{T)>  j^l],  and  [sz>e(y)]  =  max{  CPT,  max{  j  + 

(Qy)/m|  0  <y  <  CPT  }  }  is  the  minimiun  time  required  to  complete  all  the 

tasks. 

PROOF:  See  Hu  [HU61],  pp.  844-847.  Notice  that  the  terms  with  Rj  play  no 
role  when  the  DAG  is  an  in-tree  since  the  number  of  tasks  at  each  depth,  starting 
with  the  leaves  at  depth  one,  must  form  a  non-increasing  sequence.  As  long  as  there 
are  m  or  more  tasks  at  each  depth,  j  +  ^(i2y)/m|  will  form  a  non-increasing 

sequence  as  well,  while  once  there  are  fewer  than  m  leaves  at  level  j,  the  same 
sequence  will  be  non-decreasing.  The  maximum  value  of  the  sequence  must 
therefore  be  at  j  =  0,  giving  TPT/m,  or  at  j  =  CPT,  giving  CPT. 

In  cases  involving  tasks  which  are  not  unit  time  or  the  precedence  relation  is  not 
an  in-tree,  Size{J)  is  still  a  lower  bound  on  the  makespan  of  any  schedule  for  /, 
since  the  jobs  in  Qj  (for  any  j)  cannot  be  run  on  m  processors  in  less  time  than 
TPT(Qy)/m.  After  the  last  task  T  in  Qj  has  been  run,  its  successor  tasks  cannot  be 
completed  in  less  time  than  any  chain  of  them,  the  longest  having  length 
h[T)  —  u{T)>jt  by  the  definition  of  Qy.  A  symmetric  argument  shows  that  each 
je+TPT{Rj)  is  also  a  lower  bound.  On  the  other  hand,  if  the  number  of  tasks  at 
about  the  same  height  (or  depth)  varies  widely.  Size  (J)  may  underestimate  the  mak- 
espan of  /.  It  is  shown  below  that  it  never  underestimates  by  more  than  a  factor  of 
two. 

PROPOSITION  2.2:  Let  /  be  a  job  consisting  of  tasks  related  by  an  arbitrary  pre- 
cedence, and  S{J)  =  max  {  CPT{J),  TPT{J)/m  }.  If  F{J)  represents  the  mak- 
espan (finish  time)  of  J  when  run  on  m  processors  by  any  scheduler  that  never  leaves 
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a  processor  idle  unless  necessary,  then 

F{J)    ^FiTj  !_ 
-  Size{J)  -  S{J)  m'  ^^^^ 

Moreover,  for  each  m  there  exists  a  job  with  unit  tasks  for  which  the  ratio 
F{J)/Size{J)  is  arbitrarily  close  to  3/2  —  l/2m. 

PROOF:  Let  makespan  =  ar  +  j/ ,  where  x  time  units  are  spent  with  all  m  pro- 
cessors busy  and  y  time  units  are  spent  with  at  least  one  idle  processor.  At  any  time 
at  which  there  is  an  idle  processor  all  of  the  available  tasks~and  hence  all  of  the 
highest  level  tasks-  must  be  running,  so  CPT  is  being  reduced.  Therefore 
y  <  CPT.  If,  on  the  other  hand,  all  m  processors  are  busy,  then  TFT  is  being 
reduced  by  m  times  the  length  of  time;  hence,  mx  <  TPT  on  this  time  interval. 
Finally,  at  least  one  processor  is  always  busy,  so  whenever  CPT  is  reduced  by  dt,  so 
is  TPT;  hence,  mx  +  y  <  TPT.  Therefore  the  whole  job  takes  time 
X  +  y  ^{mx  +  my)/m  <  TPT jm  +  {m-\)CPT jm.  This  gives 

F{i^    ^     {m-S}iCPT  +  TPT 
Size  {J)  -  myjnax{CPT,TPT/m} 
^  (m-l)CPT 
-  myjnax{CPT,TPT/m} 

m 

<2-^. 

m 

To  prove  the  second  part,  consider,  for  any  m  >  1,  the  job  consisting  of 
2r  +  n  tasks  with  precedence  relations 

Tr  — ^/  r,.+„+i,  for  k  =  r+l,  r+2,  r+n; 

and  finally 
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Figure  2.3.   A  Worst-Case  Precedence  Graph 

Further  suppose  that  r  =  pm  for  some  integer  p  and  that  n  =  r{m—l)  +  m.  Then 
it  is  easily  calculated  that 

CPT  =2r  +1, 

TPT/m  =(2r+n)/m  =  {r{m+l)+m)/m  =  r(l+l/m)  +  1  <  CPT, 

and 

J  +  *{Qj))/m  =  j  +  {TPT-j)/m  =  j  +  (m(r+l)  +  r  -  y)/m 
=  r  +1  +  {j  +  r)/m,  for  i  =  1,2,. ..,r. 

For  y  <  r,  r  +  1  +  (y+r)/m  <2r  +  1,  so  i  +  if^{Qj))/m  <  CPT.  For  j  >  r, 
J  +if'{Qj))M  decreases  even  more  and  hence  is  always  smaller  than  CPT. 
Finally,  by  symmetry,  j  +  #(i2y))/m  is  always  less  than  CPT,  so  that 
S%ze{J)  =  CPT  =  2r  +  1.  To  calculate  F{J)  on  the  other  hand,  any  schedule  must 
run  the  first  r  tasks  and  the  last  r  tasks  sequentially;  therefore  no  schedule  can  run 
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this  job  in  less  than 

2r  +  ^/m]  =  2r  +  [(r+l-r/m)] 


=  2r  +  1+  [r-r/m) 
=  2pm  +  1  +  (pm—p) 
=  3pm  —  p  +  1, 


giving 


F{J)    ^  p(3m-l)+l  3m-l+l/p 
Size  {J)  —     2pm +1  2m+l/p 

This  ratio  is  always  less  than  {Sm—l)/2m  =  3/2— 1 /2m  and  gets  arbitrarily  close  to 
it  as  p  (and  hence  r  and  n )  gets  large. 

Whereas  either  CPT  or  TPT/m  by  itself  often  makes  reasonable  estimates  of 
the  size  of  a  job,  there  is  no  limit  on  how  large  the  ratio  of  makespan  to  either  of 
these  measure  can  become  as  m  tends  to  infinity.  Nonetheless,  as  we  have  just  seen, 
Size{J)  is  never  less  than  half  the  makespan.  There  are,  however,  even  more  clever 
measures  for  the  "size"  of  J,  but  the  more  clever  these  measures  become  the  more 
time  is  required  to  compute  them.  It  is,  after  all,  possible  (in  exponential  time)  to 
determine  exactly  how  long  the  optimal  makespan  of  any  job  is  and  use  that  as  the 
size! 

Conpliisinnc! 

This  chapter  has  explored  a  number  of  avenues  leading  to  more  efficient 
scheduling  methods  for  nmning  concurrent  programs  on  general-purpose  multiproces- 
sor systems.  In  order  to  reduce  the  turnaround  times  of  such  jobs,  a  number  of 
extensions  of  the  traditional  Shortest  Job  First  algorithm  have  been  presented,  and  a 
worst-case  analysis  of  one  of  these-called  Concurrent  Shortest  Job  First  (CSJF)  or 
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just  Shortest  Job  First—was  made.  This  analysis  indicates  that  the  concurrent  form 
of  the  algorithm  can  do  no  worse  than  its  sequential  counterpart.  It  also  indicates 
that  the  ratio  of  the  turnaround  time  of  CSJF  to  the  optimal  txirnaround  time  is 
botmded  by  m— the  nimiber  of  processors- and  also  by  a  multiple  of  the  ratio  of  the 
longest  to  the  shortest  job. 

As  a  result  of  such  analysis,  one  is  led  to  test  CSJF  against  other  scheduling 
methods  to  see  if  it  produces  the  lowest  average  turnaround  time.  The  results  of  a 
simulation  experiment  are  presented  and  analyzed  in  Sections  2.3  and  2.4,  comparing 
CSJF  with  several  other  related  scheduling  methods  and  with  a  random  scheduler. 
These  results  indicate  mostly  negligible  differences  among  the  various  schedulers, 
with  the  methods  based  on  the  Shortest  Critical  Path  First  strategy  faring  the  best  in 
the  significant  cases.  A  critique  of  the  design  of  the  simulation  pointed  out  possible 
explanations  for  the  small  differences  observed  and  indicated  ways  to  improve  future 
simulation  experiments. 

Finally,  Section  2.5  developed  a  more  sophisticated  measure  of  the  "size"  of  a 
concurrent  program,  or  of  a  directed  acyclic  graph,  in  terms  of  both  critical  path  and 
processing  time  of  certain  subsets  of  the  program.  This  value,  Size(J),  is  shown  to  be 
a  reasonably  good  lower  bound  for  the  optimal  run  time  (makespan)  of  the  job  J- 
better  than  critical  path  time  (CPT)  or  total  processing  time  (TPT)-and  hence  a 
good  candidate  for  a  "Shortest  Size  First"  scheduling  algorithm. 


CHAPTER  m 
LOOSELY  COUPLED  SYSTEMS 

Although  it  would  seem  that  allowing  multiple  processors  to  share  the  same  cen- 
tral memory  store  would  make  their  cooperative  efforts  simpler  and  faster,  such 
memory  sharing  creates  a  quantity  of  diflBculties  that  grows  rapidly  with  increasing 
munbers  of  cooperating  CPUs.  The  major  problems  here  are  first  to  provide  the 
necessary  hardware  for  multiple  direct  access  to  the  memory,  and  second  to  assure, 
by  whatever  means,  fast  and  conflict-free  access  to  the  memory  for  all  processors. 
Solutions  to  these  problems  become  expensive  and  complex,  but  experimentation  con- 
tinues in  many  directions  by  many  organizations.  The  alternative  is  to  provide  a 
separate  memory  for  each  CPU  (or  small  group  of  CPUs),  forcing  communication 
between  two  CPUs  to  be  done  by  some  kind  of  message  passing.  These  are  the 
'loosely  coupled"  systems.  The  obvious  disadvantage  of  overhead  associated  with  the 
transmission  of  messages  is  often  outweighed  by  the  savings  in  hardware  complexity 
and  by  increased  flexibility.  Naturally,  the  type  of  concurrent  problem  solving 
appropriate  on  a  tightly  coupled  system  might  be  inappropriate  on  a  loosely  coupled 
one:  fine-grained  concurrency  such  as  sharing  the  evaluation  of  parts  of  an  expres- 
sion can  create  speed-up  with  shared  memory,  but  the  commtmication  delays  would 
make  this  sharing  useless  in  a  loosely  coupled  system. 

The  scheduling  strategies  which  are  appropriate  for  a  tightly  coupled  system 
may  also  not  be  adequate  for  one  which  is  loosely  coupled.  Moreover,  even  if  all  pro- 
cessors can  run  tasks  at  the  same  rate,  the  type  of  communication  links  among  the 
processors  may  dictate  that  certain  combinations  of  task-processor  assignments  are 
more  efiicient  than  others.  The  hypercube  architecture,  for  example,  has  each  pro- 
cessor in  direct  communication  with  some  neighboring  processors,  but  messages  to 
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other  processors  must  be  forwarded  by  one  or  more  intermediate  processors.  This,  of 
course,  means  that  scheduling  successive  tasks  on  the  same  processor  or  near  neigh- 
bors should  produce  less  communication  overhead  and  shorter  makespans  than  if 
these  tasks  were  spread  over  processors  separated  by  more  intermediate  links.  This 
is  a  different  kind  of  difficulty  than  those  encountered  in  the  "classical"  scheduling 
problems  discussed  in  the  foregoing  chapters.  It,  therefore,  leads  to  a  scheduling 
model  considerably  different  from  the  one  in  Chapter  11  and  also  considerably 
different  from  that  studied  by  former  scheduling  theory  researchers.  In  industrial 
scheduling  environments,  setup  times  and  machine  differences  have  been  considered, 
but  apparently  not  delays  which  depend  on  which  machines  were  used  to  process 
predecessor  tasks. 

This  chapter  also  looks  at  a  different  performance  measure:  to  minimize  the 
total  time-or  makespan-required  to  complete  a  given  set  of  precedence-related  tasks 
on  a  loosely  coupled  system.  The  first  section  deals  with  the  more  detailed  assump- 
tions that  are  made  on  the  communications  between  processes  in  order  to  develop  a 
tractable  scheduling  model.  Section  3.2.  then  presents  the  main  result,  which  is  an 
optimal  scheduling  algorithm  for  a  particular  case  of  the  new  model.  The  final  sec- 
tion discusses  some  extensions  of  the  basic  algorithm  which  are  significant  in  their 
own  right. 

3.1.  SchRdnllriFr  and  CoTnTnunif-afinn 

In  discussions  of  the  classical  scheduling  problems  without  communication  over- 
head, the  time  required  to  make  the  scheduling  decisions  themselves  is  usually 
ignored,  even  in  the  dynamic  case.  This  assumption  of  no  scheduling  overhead  is  also 
made  in  this  section;  however,  further  assumptions  as  to  the  nature  of  the  communi- 
cation overhead  are  also  necessary  now.  As  seen  in  the  last  section,  the  type  of 


■1 


47 

■I 

architecture  affects  substantially  the  appropriate  assumptions.  Nonetheless,  there  are 
a  few  basic  requirements  that  are  imposed  throughout  the  remainder  of  this  chapter: 

(1)  All  communications  consist  of  a  number  of  "message  units."  The  number  of 
messages,  m(T,  T'),  which  must  be  sent  from  one  task  T  to  an  immediate  suc- 
cessor task  T'  is  a  fixed  integer  >  0,  independent  of  the  processors  on  which  T 

and  T  are  scheduled.  j 

(2)  The  time,  d{P,  P),  required  to  send  one  message  imit  from  processor  P  to  pro- 
cessor P  in  the  absence  of  contention  is  a  system  constant  depending  only  on  P 
and  P'.  Moreover,  d{P,  P  )  =0  if  and  only  if  P  =P'.  The  time  required  for 
the  channel  protocol  to  schedule  message  transmission  is  constant  and  forms 
part  of  the  time  d{P,  P  ). 

(3)  Communication  protocols  are  collision-free,  so  that  no  messages  are  lost  and  all 
messages  are  be  sent  in  a  finite  amount  of  time. 

(4)  In  the  presence  of  contention  for  a  particular  channel,  the  time  required  to 
transmit  a  collection  of  message  imits  is  just  the  sum  of  their  transmission  times 
plus  the  transmission  times  of  any  messages  for  which  they  must  wait. 

(5)  The  channel  processors  are  independent  of  the  task  processors,  implying  that  all 
processors  may  be  running  tasks  at  the  same  time  that  communication  is  taking 
place  among  the  processors. 

(6)  All  messages  are  sent  at  the  time  of  completion  of  the  originating  task,  and  the 
receiving  task  cannot  begin  until  all  messages  are  received  from  all  preceding 
tasks. 

It  is  instructive  to  look  at  the  effect  of  assuming— or  not  assuming— each  one  of 
these  restrictions.  The  first  two  make  the  possible  set  of  communication  delays  a 
discrete,  finite  set:  without  (l),  messages  could  be  of  arbitrary  lengths,  while  without 
(2),  the  amount  of  time  necessary  to  send  the  same  message  might  vary  from  one 
moment  to  another.  Of  course,  if  there  is  contention  for  the  communication  channel, 
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then  one  or  more  messages  may  have  to  wait  for  the  transmission  of  others,  thereby 
changing  the  time  required  for  the  messages  to  be  received.  Notwithstanding  this 
complication,  assimiptions  (3)  and  (4)  guarantee  that  this  time  will  always  be  finite 
and  predictable,  allowing  for  deterministic  schedding  policies.  If  one  of  these  restric- 
tions were  not  to  hold,  then  the  scheduling  problem  would  be  non-deterministic,  as 
there  would  be  no  way  to  predict  exact  communication  delays.  Assimiption  (5)  sim- 
ply says  that  communications  are  not  to  be  treated  as  extra  tasks  to  be  scheduled 
and  executed  on  the  given  processors:  this  corresponds  to  a  hardware  assmnption  of 
'mtelligent"  I/O  processors.  The  final  restriction,  assumption  (6),  is  the  least  critical. 
While  (5)  makes  it  clear  that  there  can  be  "overlapping"  of  communication  times  and 
execution  times,  (6)  says  that  communication  between  T  and  T'  cannot  overlap 
either  T  or  T'.  This  final  assumption  could  be  relaxed  and  still  produce  a  determinis- 
tic scheduling  problem. 

Each  of  the  foregoing  assumptions  on  the  nature  of  the  communications 
corresponds  to  certain  assmnptions  on  the  nature  of  the  hardware  and  communica- 
tions software  of  the  system.  In  the  case  of  a  system  such  as  the  hypercube,  which  is 
not  fully  connected.  Assumption  (2)  must  be  modified  to  apply  only  to  two  processors 
which  can  communicate  directly  with  one  another;  beyond  that,  the  time  required  to 
send  a  message  from  P  to  P'  must  be  calculated  on  the  basis  of  the  route  chosen  and 
the  contention  encountered  on  each  leg  of  the  communication  route. 

A  large  number  of  static,  deterministic  scheduling  problems  can  now  be  pre- 
cisely stated  under  these  assumptions.  This  is  done  in  Hwang  et.  al  [HWAN86], 
which  presents  several  different  problems  of  this  kind  and  indicates  that  there  are 
thousands  more  depending  on  the  selection  of  the  architecture  and  the  traditional 
parameters  of  the  system.  It  would  be  of  great  interest  to  begin  to  draw  the  boun- 
daries between  the  NP-hard  problems  and  the  polynomial-time  problems  in  this  new 
problem  space. 
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Perhaps  the  most  encompassing  attempt  to  attack  the  general  problem  of 
minimizing  the  makespan  of  a  schedule  in  the  presence  a  general  DAG  and  commun- 
ication delays  is  found  in  the  Ph.D.  dissertation  of  J.- J.  Hwang  [EfWAN87]  in  which 
he  presents  an  intelligent  heuristic  algorithm  called  Earliest  Task  First  (ETF).  ETF 
is  a  "greedy"  strategy  which,  at  any  moment  that  a  processor  becomes  free,  attempts 
to  schedule  some  task  as  early  as  possible  on  that  processor.  Although  the  strategy  is 
not  optimal,  Hwang  establishes  a  worst  case  bound  on  its  performance  which  is  cited 
in  Proposition  3.1. 

PROPOSITION  3.1:  Given  a  set  of  tasks  with  a  general  precedence  relation  and 
given  a  loosely  coupled  system  of  m  identical  processors  satisfying  the  conditions  (1) 
to  (6)  above,  let  MgxF  makespan  of  a  schedule  produced  by  ETF  and  Mgpt  be 

the  optimal  makespan.  Then 

A/gjyr  ^(2  —  —)XMof)t  +  MaxChainComm ,  (1) 
m 

where  MaxChainComm  is  the  maximum  sum  of  the  form  J^axDelayXm{i,j),  the 
simi  taken  along  any  chain  of  tasks  in  the  system.  MaxDelay  is  the  maximum  of  all 
communication  parameters  d{P,  P  )  taken  over  all  pairs  of  processors. 

3.2.  An  Algorithm  for  PrpceHpnee  Trees 

In  order  to  obtain  an  optimal  scheduling  algorithm  in  the  case  of  non-negligible 
communication  times,  it  is  necessary  to  make  even  greater  restrictions  on  the  prob- 
lem than  those  imposed  in  Section  3.1.  To  begin  with,  the  corresponding  problem 
without  communication  delays  must  be  solvable  in  polynomial  time.  As  indicated  at 
the  outset  of  the  chapter,  this  discussion  focuses  only  on  the  problem  of  minimizing 
the  makespan  of  a  set  of  tasks.  Since  there  would  be  no  communication  if  the  tasks 
were  independent,  it  is  assimied  that  there  is  a  precedence  relation  among  them. 
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representable  by  a  DAG  as  always.  Since  in  the  classical  case,  minimizing  the  mak- 
espan  is  NP-hard  even  for  unit  execution  times  (UET)  [ULLM75],  further  restrictions 
must  be  imposed.  For  two  processors  and  UET,  an  algorithm  by  Fuji,  Kasami,  and 
Ninomiya  is  optimal  [FUJI69],  as  is  the  better- known  algorithm  of  Cofhnan  and  Gra- 
ham [COFF72],  In  this  case,  allowing  two  different  possible  execution  times  again 
makes  the  problem  intractable  [ULLM75].  On  the  other  hand,  restricting  the  pre- 
cedence relation  to  an  opposing  forest  (each  connected  component  of  the  DAG  is  a 
tree)  and  restricting  the  number  of  processors  to  any  fixed  constant  again  produces  a 
polynomial  time  scheduling  problem  [GARE83].  K  m  is  an  arbitrary  parameter  of 
the  problem,  polynomial  time  solutions  are  still  possible  if  the  DAG  is  fm-ther  res- 
tricted to  be  a  forest  of  in-trees  (all  out-degrees  =  1)  [HU  61]  or  a  forest  of  out-trees 
(all  in-degrees  =  1)  [BRUN82].  The  general  case  of  arbitrary  m  and  an  opposing 
forest  remains  open  [DOLE85]  in  the  classical  case. 

The  obvious  problems  to  examine  in  the  case  of  significant  communication  over- 
head, therefore,  are  those  with  UET  and  either  two  processors  or  with  many  proces- 
sors and  in-tree  or  out-tree  forests.  An  algorithm  which  the  author  conjectures  is 
optimal  for  in-tree  forests  and  arbitrary  m  is  presented  in  the  next  section:  here 
different  assumptions  are  introduced.  Whereas  in  the  classical  case,  if  there  are  more 
processors  than  tasks  then  the  scheduling  problem  becomes  trivial,  in  the  new  situa- 
tion, the  problem  of  allocation  of  tasks  to  processors  still  remains  diflJcult.  Consider 
the  example  of  the  DAG  given  in  Figure  3.1,  where  each  task  is  assumed  to  have 
unit  execution  time.  With  no  communication  overhead,  this  can  be  scheduled  on  two 
processors,  as  in  Figure  3.2  (a),  to  execute  with  an  optimal  makespan  of  three.  If 
communication  times  of  0.5  are  supposed  between  any  combination  of  tasks  and  pro- 
cessors, then  wherever  is  scheduled,  it  will  have  to  start  at  least  0.5  time  unit 
later  than  either  or  Tg,  so  Figure  3.2  (b)  shows  an  optimal  schedule  in  time  3.5. 
If  the  delays  due  to  the  communications  varied  between  different  tasks  and 
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Figure  3.1.   A  DAG  of  Unit-Time  Tasks 

processors,  then  the  assignment  to  processors  would  be  even  more  critical  and  the 
makespan  could  be  even  longer.  However,  if  the  delays  become  greater  than  one, 
then  the  schedule  of  length  four  on  one  processor,  shown  in  Figure  3.2  (c),  becomes 
optimal. 
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Figure  3.2.    Three  Optimal  Schedules 

Still  working  with  the  same  DAG  of  Figure  3.1,  another  significant  idea  emerges 
if  one  considers  the  case  where  communication  time  is,  say,  1.5  between       and  its 
immediate  successors  but  only  0.5  between  these  successors  and  T^.  Then  it  would 
appear  that  again  Figure  3.2  (c)  would  be  an  optimal  schedule  since  placing  and 
on  different  processors  would  make  one  of  them  wait  until  time  2.5  to  start  and 
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produce  a  schedule  of  length  4.5.  This  can  be  avoided,  however,  by  running  Tj  on 
both  Pi  and  This  extra  use  of  memory  space  and  processor  time  allows  the  crea- 
tion of  the  optimal  schedule  of  length  3.5  shown  in  Figure  3.3.  (See  [PAPA87].) 
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Figure  3.3.   An  Optimal  Schedule  with  Task  Duplication 

It  is  clear,  therefore,  that  assuming  "sufficient"  processors  to  run  all  available 
tasks  at  any  moment  does  not  trivialize  the  scheduling  problem  with  communication 
overhead  the  way  it  does  the  classical  problem.  In  fact,  without  further  assumptions 
the  problem  remains  quite  complex.  The  algorithm  presented  below  is  shown  to  be 
optimal  under  the  additional  restriction  that  the  communication  delays  are  not  longer 
than  the  task  execution  times;  notwithstanding,  it  also  works  without  the  usual  UET 
assumption. 

Assume  that  there  are  given  n  tasks,  Tj,  Tg,  r„  satisfying  a  precedence 
relation  and  m  identical  processors,  P^,  Pc^,  P^.  Assume  that  the  processors  are 
loosely  coupled,  so  that  the  communication  time  between  them  is  not  negligible  with 
respect  to  the  processing  times  of  the  tasks.  Let  w,-  represent  the  processing  time  of 
task  T,  ,  and  assume  that  the  six  communication  assumptions  of  Section  3.2  hold  in 
this  system.  Finally,  the  following  restrictions  should  be  assumed: 
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•  In-Forest  Precedence:  The  precedence  DAG  of  the  tasks  is  in  the  form  of  a 
forest  of  in- trees  (all  nodes  with  outdegree  <  1). 

•  Sufficient  Processors:  m  >  n .  (In  fact,  m  >  the  number  of  leaves  of  the  forest 
is  a  sufficient  condition.) 

•  Short  Communication  Delays:  The  time  required  for  any  task  to  commimicate 
its  results  to  an  immediate  successor  is  less  or  equal  to  min{«,- :  1  <  t  <  n  }  in 
all  cases  and  is  equal  to  0  if  scheduled  on  the  same  processor. 

•  Identical  Links:  The  time  d{P,  P)  required  to  send  a  message  imit  from  P  to 
P'  is  constant,  independent  of  the  processors.  (It  is,  of  course,  0  if  P  =  P  .) 

•  Fully  Connected:  All  processors  can  communicate  directly  with  all  others 
without  contention.  Thus  any  number  of  processors  may  communicate  with  any 
others  simultaneously. 

The  following  algorithm  determines  an  assignment  of  the  n  tasks  to  m  proces- 
sors (for  sufficiently  large  m )  in  such  a  way  as  to  minimize  the  makespan  of  running 
all  the  tasks.  It  uses  the  scheduling  strategy  of  joining  each  task  with  that  predeces- 
sor which  would  otherwise  cause  the  longest  delay:  for  that  reason  it  is  named  Join 
Latest  Predecessor. 

Algorithm  ■'^.l.  .HP  (join  laW.  prpHprps.snr) 

Input:  Tasks  1,  2,  ...,n,  with  processing  times  «i,  Ug,  u„;  precedence  rela- 
tion — >■  such  that  for  any  j,  t— for  at  most  one  j;  communication 
delays  c[i,  j]  for  each  »  — i  such  that  c[t,  y]<«^  for  all  t,j,k. 
Further  assume  that  the  tasks  are  numbered  such  that  i  j  implies 
that  »  <j  ("topological  order"). 
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Output:  For  each  t  <  n,  3  numbers:  P{i),  indicating  the  processor  on  which  task 
i  should  be  scheduled;  S{i)  and  F{i)  indicating  the  start  time  and  finish 
time,  respectively,  of  task  i  on  processor  P{i). 

BEGIN 

1.   FOR  J  =  1  TO  n 
BEGIN 

IF  j  is  a  leaf  (no  predecessors) 

1.1.  THEN  Set  P{j)=j,  S{j)=0,  F{j)  =  uj. 

{  j  is  now  "scheduled."  } 

1.2.  ELSE 

BEGIN 

1.2.1.  Find  an  immediate  predecessor  k  such  that  F{k)  +  c  [k,j]  is 
maximum  for  all  immediate  predecessors  of  j. 

1.2.2.  Set  P{j)  =  P{k). 

{  Assures  that  j  need  not  be  delayed  by  c  {k,j].  } 

1.2.3.  Set  S{j)  = 

max{  F{k),  max{  F{t)  +  c        :  i  — j  and  i  ^  k  }. 

{ j  will  start  when  k  finishes  or  when  the  latest  communi- 
cation arrives  from  its  other  immediate  predecessors.  } 

1.2.4.  Set  F{j)=S{j)  +  uj. 

{  j  is  now  "scheduled."  } 
END  ELSE 
END  FOR 
END  JLP. 
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It  is  now  necessary  to  establish  that  this  algorithm  does,  in  fact,  produce  the 
best  possible  assignment  of  tasks  to  processors  in  the  sense  that  the  schedule  pro- 
duced minimizes  the  makespan.  This  is  the  content  of  the  next  theorem. 

THEOREM  3.1:  The  schedule  produced  by  the  JLP  algorithm  yields  the  minimum 
possible  makespan  under  the  five  hypotheses  set  out  before  the  algorithm. 

PROOF:  That  the  schedule  is  feasible  is  clear  from  Step  1.2.3,  which  starts  the  job 
j  only  after  all  predecessors  have  finished  and  all  messages  have  had  time  to  arrive. 
Step  1.1,  of  course,  depends  upon  having  sufficient  processors,  but  note  that  the  algo- 
rithm produces  a  feasible  schedule  even  without  the  Short  Communication  Delays 
assumption. 

Now  suppose,  for  the  sake  of  contradiction,  that  there  is  a  better  schedule  than 
that  of  JLP  which  gives  finish  times  F'{i),  for  i  =  1,  2,  n.  Then  there  must  be  a 
such  that  F\j)  <  F{j)  and  F\i)>F{i)  for  i  <  j.  Since  all  leaves  have 
F{i)  =  Ui,  such  a  j  cannot  be  a  leaf.  Let  k  be  the  predecessor  of  j  found  in  Step 
1.2.1  of  the  algorithm,  k  <  j  due  to  the  topological  order,  so  by  the  choice  of 
F'{k)  >F{k);  hence  j  cannot  start  before  F[k)  ->rc[j,k]  >S{j)  unless  j  is  run  on 
the  same  processor  as  k.  Similarly,  if  i  is  any  other  predecessor  of  j,  then 
F{i)>F{i).  Therefore,  if  j  runs  on  the  same  processor  as  A;,  it  cannot  start  before 
the  time  S{j)  given  in  Step  1.2.3  unless  some  other  immediate  predecessor,  say  r, 
runs  on  the  same  processor  as  k  and  j.  In  this  case,  if  F\r)  <  F\k),  then 
F{k)>F  (r)  -f-  «i  >  F'(r)  +  c  [r,j]  by  the  short  communications  assumption.  But 
F{r)  +  c  [r,j]  >F{r)  +  c  [r,j],  so  scheduling  r  on  the  same  processor  as  j  does  not 
allow  j  to  start  any  earlier  than  F'{k)>F{r)  +  c[r,j],  which  is  no  improvement 
over  JLP.  Symmetrically,  if  r  runs  on  the  same  processor  as  j  and  k  and  if 
F'{r)  >  F'ik),  then  it  follows  that  F'{r)  >  F{k)  +  c[k,j],  which  again  offers  no 
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improvement  over  JLP.  Therefore  s'{j)  '>S{j),  a  contradiction.  It  follows  that  no 
other  schedule  produces  a  shorter  makespan  than  JLP.  □ 

An  example  of  a  JLP  schedules  appears  in  Figure  3.4. 
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Figure  3.4.  A  JLP  Schedule,  (a)  UET,  UCT  Tasks  (b)  Finish  Times 
(c)  Processor  Assignments  (d)  JLP  Schedule 

THEOREM  3.2:  The  time  complexity  of  the  JLP  algorithm  is  0(n);  that  is,  the 
time  required  to  produce  the  schedule  is  linear  in  the  number  of  tasks  to  be 
scheduled.  (This  assumes  that  the  precedence  relation  is  given  in  terms  of  immediate 
successors  as  shown  in  the  algorithm.  If  the  tasks  are  not  in  topological  order,  the 
algorithm  requires  minor  revision,  but  Theorem  3.2  remains  true.) 
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PROOF:  JLP  can  do  no  better  than  0(n),  since  the  main  loop,  Step  1,  executes  n 
times.  This  bound,  0(n),  will  be  achieved  provided  that  the  total  number  of  steps 
reqmred,  during  all  n  iterations,  to  find  the  predecessor  k  in  Step  1.2.1  and  to  calcu- 
late the  value  of  S(j)  in  Step  1.2.3  is  0(n).  In  order  to  do  this,  a  list  of  immediate 
predecessors  must  be  initialized  (in  0(n)  time)  for  each  task  during  the  input  phase. 
It  is  easy  to  look  through  the  list  of  predecessors  of  a  task  j  once  only  and  calculate 
the  maximum  and  second  largest  values  required  in  Steps  1.2.1  and  1.2.3.  Since  the 
DAG  is  a  forest  of  in-trees,  each  task  appears  at  most  once  as  a  predecessor  of  some 
other  task,  guaranteeing  that  over  the  n  iterations  of  the  main  loop  only  0(n)  steps 
go  into  these  calculations.  □ 

3.3.  Kxtensinns 

There  are  a  number  of  extensions  that  can  be  made  to  the  JLP  algorithm  under 
different  relaxations  or  changes  in  the  hypotheses.  An  obvious  place  to  start  is  to 
look  at  a  forest  of  out-trees,  keeping  the  other  hypotheses  of  Section  3.2  the  same. 
In  this  case,  however,  if  task  duplication  is  allowed,  as  illustrated  in  Section  3.2,  Fig- 
ure 3.3,  an  essentially  trivial  algorithm  always  produces  an  optimal  schedule  with  no 
communication  delay  at  all.  The  algorithm  would,  for  each  leaf  j,  schedule  all  the 
tasks  on  the  unique  path  from  the  root  to  j  on  processor  j.  Since,  in  an  out-tree, 
each  task  has  a  unique  predecessor,  the  tasks  scheduled  on  any  one  processor  have  all 
their  predecessors  scheduled  on  the  same  processor  and  hence  there  is  never  any  need 
for  message  passing.  This  procedure  requires  multiple  copies  of  most  tasks-a  space 
complexity  of  0{n^)-hnt  produces  a  makespan  equal  to  the  critical  path  time  of  the 
DAG,  which  is  always  a  lower  bound  on  the  makespan  of  any  schedule.  In  order  to 
implement  the  algorithm,  it  is  only  necessary  to  obtain,  for  each  task,  the  list  of  its 
immediate  successors  and  the  count  of  the  number  of  leaves  which  are  successors  of 
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each  task.  Since  this  count,  for  any  task,  is  equal  to  the  sum  of  the  counts  of  each  of 
its  successors,  the  count  can  be  calculated,  starting  with  the  leaves  (highest  numbered 
tasks),  in  0(n)  time.  Once  the  cotmt  of  leaves  is  obtained,  call  it  C{j)  for  each  j, 
the  scheduler  simply  schedules,  starting  with  the  root,  C{j)  copies  of  j  on  C{j)  of 
the  processors  running  j's  tmique  predecessor.  This  also  takes  0{n)  time. 

Slightly  more  interesting  is  the  same  problem  of  a  forest  of  out-trees  when 
duplications  of  tasks  are  prohibited.  Some  communication  delays  are  now  unavoid- 
able, but  due  to  the  assimaptions  of  Sufficient  Processors  and  Identical  Links,  it  is 
possible  to  turn  the  DAG  upside  down  and  use  JLP.  More  specifically,  define  the 
dual  DAG  by  replacing  i —*■  j  by  n— j-l-l  — ♦  n— t-|-l  and  setting 
c{n—j+l,  n—i+l]  =  c[i,  j]  for  every  precedence-related  pair.  If  the  original  DAG 
was  an  out-forest,  an  in-forest  is  obtained,  and  vice  versa.  It  is  clear  that  the  dual  of 
the  dual  is  again  the  orignal  DAG  and  communication  times.  Now,  if  JLP  is  applied 
to  the  dual  of  the  given  out-tree,  obtaining  starting  times  S{j),  finish  times  F{j),  and 
a  makespan  of  M,  then  an  optimal  schedule  for  the  original  problem  is  obtained  by 
setting 

s'U)=M-F{n-j+l), 
F'{j)=M  -S{n-j+l), 

and  assigning  all  tasks  to  the  same  processors  assigned  by  JLP.  If  this  algorithm  is 
called  JLP',  then  the  following  result  holds. 

THEOREM  3.3:  Given  the  same  assumptions  used  with  the  JLP  algorithm  except 
that  the  precedence  relation  gives  a  DAG  which  is  an  opposing  forest  (a  disjoint 
union  of  in-trees  and  out-trees),  and  if  duplicate  copies  of  tasks  are  not  executed  on 
more  than  one  processor,  then  scheduling  the  in-forest  on  one  set  of  processors  with 
JLP  and  the  out-forest  on  a  disjoint  set  of  processors  with  JLP'  produces  an  optimal 
schedule  with  respect  to  the  makespan. 
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PROOF:  Given  sufficient  processors,  running  the  in-trees  and  out-trees  on  disjoint 
sets  of  processors  creates  a  makespan  which  is  the  maximum  of  the  makespans  of  the 
two  disjoint  sets.  If  each  is  optimal,  then  the  larger  is  the  optimal  makespan  for  the 
whole  opposing  forest.  It  has  already  been  established  that  JLP  is  optimal,  so  it 
remains  to  be  shown  that  JLP'  is  optimal.  But  JLP'  on  an  out-tree  produces  a 
schedule  the  same  length  as  JLP  does  on  the  dual  in-tree.  If  there  were  a  shorter 
schedule  for  the  out-tree,  taking  the  dual  schedule  for  the  dual  in-tree  would  produce 
a  shorter  schedule  for  the  in-tree:  a  contradiction  to  the  optimality  of  JLP.  The 
only  problem,  therefore,  is  the  feasibility  of  the  JLP'  schedule.  As  mentioned  before, 
this  is  a  consequence  of  the  Sufficient  Processors  and  Identical  Links  assumptions: 
sending  a  message  from  P  to  P'  takes  the  same  time  as  sending  a  message  from  P'  to 
P.  This  means  that  when  Step  1.2.3  of  JLP  is  performed,  guaranteeing  that  j  does 
not  start  before  all  its  predecessors'  messages  have  been  received,  it  also  guarantees 
that  all  of  the  successors  of  n  -  j  -|-  1  in  the  dual  tree  do  not  start  before  all  have 
received  their  messages  from  their  sole  predecessor.  Hence  the  feasibility  of  the 
schedule  is  assured.  (E  duplications  are  allowed  in  scheduling  the  out-trees,  this  sym- 
metry is  destroyed  since  it  is  not  possible  to  run  task  j  just  because  every  required 
message  has  been  received  by  some  copy  of  j.)  □ 

Having  extended  the  JLP  algorithm  to  handle  all  opposing  forests  under  the 
assumption  of  Sufficient  Processors  and  Short  Communication  Delays,  it  is  natural 
next  to  return  to  in-trees  (or  forests  of  in-trees)  and  ask  if  JLP  will  work  if  one  or  the 
other  of  these  assumptions  is  removed.  It  is  evident  that  JLP  will  continue  to  pro- 
duce feasible  schedules  with  any  length  communication  delays,  but  it  does  lose  its 
optimality.  For  example,  if  communication  delays,  for  example,  are  as  long  as  the 
total  processing  time  of  all  the  tasks  combined,  the  best  thing  to  do  is  run  all  tasks 
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on  a  single  processor;  JLP,  however,  always  insists  in  starting  out  with  all  leaves  on 
different  processors. 

Dropping  the  asstunption  of  Sufficient  Processors,  in  the  other  direction,  leads 
JLP  into  trouble  immediately,  since  Step  1.1  cannot  always  be  carried  out.  There  is, 
however,  a  value  to  JLP  in  this  case  because  it  always  provides  a  lower  bo\md  on  the 
time  required  to  execute  any  task  in  the  in-tree. 

LEMMA  3.1:  Given  a  DAG  of  n  tasks  with  commimication  delays  and  m  identical 
processors  satisfying  all  the  five  hypotheses  of  the  JLP  algorithm,  the  time  S{j)  pro- 
duced by  JLP  is  the  earliest  time  that  j  can  be  started  by  any  scheduling  algorithm. 
If  the  hypotheses  of  Sufficient  Processors  and  In-Forest  Precedence  are  dropped,  S{j) 
remains  a  lower  boimd  on  the  starting  time  of  j  in  any  schedule  (unless  duplicate 
copies  of  tasks  are  allowed). 

PROOF:  That  each  task  is  optimally  scheduled  by  JLP  is  what  was  actually  demon- 
strated in  the  proof  of  Theorem  3.1.  That  with  a  limited  number  of  processors  no 
shorter  schedule  is  possible  than  with  an  unlimited  number  is  obvious  (Graham's  tim- 
ing anomalies  [GRAH69]  notwithstanding).  Without  the  assumption  of  unique  suc- 
cessors, JLP  will  frequently  schedule  successors  of  a  task  on  the  same  processor  at 
the  same  (or  overlapping)  time.  The  schedules  so  obtained  are  not  feasible,  but  the 
times,  S(j),  calculated  by  the  algorithm  can  still  only  be  better  than  those 
corresponding  to  a  feasible  schedule.  □ 

Hu  is  credited  with  the  first  formal  proof  that  the  Highest  Level  First  (HLF) 
scheduling  policy  could  produce  optimal  schedules  [HU61].  HLF  always  schedules 
first  one  of  the  available  tasks  of  highest  "level"  in  the  DAG.  "Level"  is  defined  the 
same  as  "height"  in  Section  2.5.  A  quarter  century  ago,  Hu  showed  that  this  strategy 
works  for  minimizing  the  makespan  on  any  number  of  processors  when  the  tasks  are 
unit  execution  time  (UET)  and  the  precedence  is  an  in-tree  [HU61].  Consider  now 
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now  the  case  of  a  loosely  coupled  system  of  m  processors  and  an  in-tree  DAG  of  n 
UET  tasks  with  "unit  communication  times"  (UCT):  all  c[i,  j]  =  1  unless  i  and  j 
are  scheduled  on  the  same  processor.  HLF  is  a  tempting  strategy  here,  but  fails 
because  it  ignores  the  added  "height"  caused  by  the  communication.  Figure  3.5 
shows  a  case  in  which,  on  three  processors,  HLF  may  obtain  a  schedule  of  length  six, 
while  the  optimal  schedule  is  of  length  five. 

Level  (Height) 


Figure  3.5.   A  UET,  UCT  Problem  for  which  HLF  is  suboptimal 
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Figure  3.6.     Suboptimal  HLF  Schedule  for  Figure  3.5 

JLP  ferrets  out  the  vmavoidable  communication  delays  and  allows  an  extended 
definition  of  height  which  looks  very  promising  as  the  basis  of  a  new  HLF  algorithm 
for  loosely  coupled  systems.  In  the  above  example,  it  makes  task  4  lower  than  tasks 
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5,  6,  and  7,  forcing  a  HLF  strategy  into  the  optimal  strategy  of  scheduling  task  7 
rather  than  4  in  the  second  time  slot  and  allowing  9  and  10  to  start  one  time  unit 
earlier.  The  algorithm  that  follows,  called  Extended  JLP,  or  EJLP,  presents  this  for- 
mally. 

Algorithm  3.2.  K.TT.P  (extended  join  latest  predecessor) 

Input:  Tasks  1,  2,  n,  (with  irnit  processing  times);  precedence  relation  —*■[  such 
that  for  any  i,  for  at  most  one  j;  Further  assume  that  the  tasks  are 

nimibered  such  that  i—^j  implies  that  i  <j  ("topological  order");  an  integer 
m  indicating  the  number  of  processors. 

Output:  For  each  i  <n,  2  numbers:  P'{i)  <m,  indicating  the  processor  on  which 
task  i  should  be  scheduled;  S{i)  indicating  the  start  time  of  task  i  on  proces- 
sor P\i). 

BEGIN 

1.  Execute  JLP  to  assign  a  "processor  number,"  P{i\to  each  task  i. 
{  P{i)  may  be  >m.  It  is  only  used  to  determine  height.  } 

2.  FOR  j  =  n  DOWN  TO  1  {  Assign  a  height,  h,  to  each  task.  } 
BEGIN 

IF  j  has  no  successors 

2.1.  THEN  Set  h{j)  =  l 

2.2.  ELSE  {  Let  k  be  the  imique  successor  of  y.  } 

IF  P{j)=P{k) 

2.2.1.  THEN  Set  h{j)  =  h{k)^l 

2.2.2.  ELSE  Set  h{j)  =  h{k) -\-2 

END  FOR 
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3.  Execute  HLF  using  the  values  h.[j)  to  define  the  height,  or  level.  A  task  j  is 
only  available  for  scheduling  at  time  t,  however,  if  all  its  predecessors  are 
scheduled  for  time  <  f  — 1  and  at  most  one  of  its  predecessors  is  scheduled  to 
start  at  time  t—l.  S{j)  is  the  start  time  assigned  to  j  by  this  HLF  algorithm. 
•P(i)  ^  ^  is  arbitrary,  except  that 

(a)  if  SU)  =  5(0,  then  P'{j)  ^  P'{i),  and 

(b)  if  exactly  one  predecessor,  k,  of  j  is  scheduled  at  time  S{j)—1,  then  P'{j) 
must  be  equal  to  P'{k). 

END  EJLP. 

LEMMA  3.2:  The  EJLP  Algorithm  produces  a  feasible  schedule  for  an  in-tree  DAG 
imder  the  UET  and  UCT  assmnptions. 

PROOF:  Step  1  can  be  carried  out  since  all  assumptions  for  JLP  except  Sufficient 
Processors  are  in  effect  and  the  numbers  P{j)  are  not  to  be  interpreted  as  actual 
assignments  to  processors.  Step  2  makes  sense  because  by  taking  the  tasks  in  reverse 
order,  the  topological  ordering  guarantees  that  if  k  is  the  immediate  successor  of  j, 
then  k>j,  so  h{k)  is  assigned  before  considering  task  j.  Finally,  HLF  assigns  ready 
tasks  to  available  processors  and  presents  no  problem  when  there  is  no  communica- 
tion overhead.  Under  the  assumptions  for  EJLP,  all  tasks  and  all  communication 
times  are  of  length  one;  therefore  each  task  will  become  ready-all  predecessors 
scheduled  and  all  messages  received-at  some  integer  time  i  =  0,  1,  2,  •  •  •  .  If  all 
but  one  of  the  predecessors  of  a  task  j  are  scheduled  to  start  before  time  t—l,  then 
by  time  t,  all  messages  to  j  except  those  of  the  latest  predecessor,  say  i,  have  been 
received.  By  scheduling  j  on  the  same  processor  as  t,  no  messages  need  be  sent  from 
»  to  j  and  hence  j  can  start  at  time  t. 

If,  on  the  other  hand,  two  predecessors  of  j  are  scheduled  to  start  at  time  t-1, 
then  no  matter  to  what  processors  they  are  assigned,  j  will  have  to  wait  for  messages 
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from  one  of  them  and  hence  cannot  start  until  time  t+1.  The  conditions  presented 
in  Step  3  precisely  assure  that  tasks  are  not  scheduled  before  it  is  feasible  to  do  so. 
Since  there  is  a  maximum  of  one  successor  for  each  task,  condition  (b)  of  Step  3  can 
always  be  met.  □ 

CONJECTURE:  For  m  <6  processors,  EJLP  always  produces  a  schedule  with 
minimum  makespan  for  a  set  of  UET,  UCT  tasks  satisfying  the  In-Forest  Pre- 
cedence, Identical  Links,  and  Fully  Connected  hypotheses. 

The  unusual  number  m  =  6  features  in  this  conjecture  because  it  can  be  shown 
that  for  m  <6  that  EJLP  has  to  reduce  the  "height"  calcvilated  by  the  algorithm  in 
such  a  way  as  to  even  out  the  heights  of  the  highest  remaining  tasks.  In  the  example 
of  Figure  3.7,  however,  one  valid  EJLP  schedule  takes  the  rightmost  six  leaves  at 
time  zero,  the  leftmost  six  leaves  at  time  one,  the  rightmost  six  new  leaves  at  time 
two,  and  all  available  tasks  from  then  on,  producing  a  schedule  of  length  six.  In  this 
case,  the  two  leftmost  subtrees  are  assigned  heights  of  three  and  four,  respectively, 
and  the  EJLP  schedde  indicated  reduces  these  to  one  and  three  at  time  t  =  1  and 
then  to  heights  zero  and  two  at  time  t  =  3.  The  schedule  produced  is  of  length  six, 
while  an  optimal  schedule  would  be  of  length  five. 


Height  (h) 
5 
4 
3 
2 
1 


Figure  3.7.  An  In-Forest  Where  EJLP  Fails  with  m  =  6 
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It  is  important  to  observe  in  this  example,  as  in  the  example  of  Figure  3.6,  that 
the  algorithms  being  studied  could  produce  optimal  schedules  in  the  given  cases.  An 
algorithm  is  considered  optimal,  however,  only  if  any  schedule  which  follows  the  rules 
of  the  algorithm  must  be  optimal.  For  the  DAG  of  Figure  3.7,  EJLP  could  also  have 
chosen  the  three  tasks  of  height  five  and  the  three  leftmost  tasks  of  height  four  at 
time  zero  and  proceeded  to  produce  an  optimal  schedule  of  length  five.  It  is  tempt- 
ing to  conjectvire  that  EJLP  would  be  optimal  for  all  m  if  modified  to  pick,  among 
tasks  of  the  same  height,  in  such  a  way  as  to  minimize  the  number  of  tasks  left 
"blocked."  (A  task  is  blocked  if  all  its  predecessors  have  been  scheduled  but  it  must 
still  wait  for  a  message.  Blockage  occurs  only  when  the  last  two  or  more  predeces- 
sors were  scheduled  at  the  last  time  interval.  The  effect  of  blockage  is  reflected  in 
Step  2.2.2  of  EJLP.) 

The  foregoing  paragraphs  have  considered  what  happens  when  the  assumptions 
of  the  JLP  algorithm  are  relaxed  by  allowing  longer  communication  times  or  by  res- 
tricting the  nmnber  of  processors.  As  a  final  investigation,  the  Short  Communication 
Times  and  SuflBcient  Processors  hypotheses  are  reinstated,  but  all  restrictions  on  the 
DAG  are  dropped.  Recalling  Proposition  3.1  presented  in  Section  3.1,  assuming  m  is 
arbitrarily  large  and  communication  delays  are  shorter  than  task  processing  times, 
the  worst  case  boimd  for  applying  the  ETF  strategy  becomes 

METF<'iM„pt+CPT,  (2) 

where,  as  usual,  OPT  =  Critical  Path  Time  is  the  longest  total  processing  time  along 
any  path  in  the  DAG.  Since  the  makespan  is  never  less  than  CPT,  this  could  be 
weakened  slightly  and  written  simply  as 

METF<^Kpf  (3) 

Surprisingly,  it  is  possible  to  produce  an  optimal  scheduling  algorithm  for  the  case  of 
Sufficient  Processors  and  Short  Processing  Times  provided  duplicate  executions  of  the 
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same  tasks  are  once  again  allowed.  The  optimal  strategy  is  also  an  extension  of  the 
basic  JLP  algorithm.  It  is  based  on  the  observation  that  any  DAG,  through  duplica- 
tion of  tasks  with  out-degree  greater  than  one,  can  be  turned  into  an  in-tree  whose 
execution  will  produce  the  same  result  as  the  execution  of  the  original  DAG.  It 
should  be  warned,  however,  that  whereas  such  duplication  makes  perfectly  good 
sense  in  computer  science,  as  a  model  for  an  assembly  procedure  or  industrial  plan- 
ning it  could  be  complete  nonsense.  An  informal  description  of  an  algorithm 
(JLP/D)  embodying  this  idea  appears  below.  The  algorithm  is  based  on  a  suggestion 
by  an  anonymous  reviewer  of  a  paper  on  the  JLP  algorithm  submitted  for  publica- 
tion by  the  author,  J.-J.  Hwang,  and  Y.-C.  Chow. 

Algorithm         .n.P/D  (.n.P  with  task  Hnpli^^t.inn) 

Input:    Tasks  1,  2,      n,  with  processing  times  u^,  u^,      «„;  arbitrary  precedence 

relation  — Communication  delays  c\i,j\  for  each  such  that 

^     for  all  i,j,k.  Further  assume  that  the  tasks  are  numbered  such 

that  implies  that  i  <j  ("topological  order"). 

Output:  For  each  t  <  n,  3  numbers:  F(i),  indicating  the  processor  on  which  task  i 

should  be  scheduled;  S{i)  and  F[i)  indicating  the  start  time  and  finish  time, 

respectively,  of  task  i  on  processor  P{i). 

BEGIN 

Execute  JLP  to  assign  processor  numbers,  P{i),  start  times,  S{i),  and  finish 
times,  F{i)  to  each  task  i,  but  with  the  following  modification:  Whenever,  in 
the  y-th  iteration  of  the  main  loop,  JLP  schedules  task  j  in  such  a  way  that 
P{3)=P{i)  and  the  intervals  (5(i),  F{j))  and  (5(i),  F{i))  overlap  for  some 
t  <  j,  copy  the  part  of  the  schedule  consisting  of  all  the  predecessors  of  j  onto 
a  disjoint  set  of  processors.  If  JLP  scheduled  j  on  the  same  processor  as  its 
immediate  predecessor  k,  then  reassign  P{j)  =  P{k),  where  k'  is  the  copy  just 
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created  of  k.  (In  succeeding  iterations  of  the  main  loop,  it  is  not  necessary  to 
consider  the  copies  created  in  the  calculations  performed  in  Steps  1.2.1  and 
1.2.3.) 

END. 

THEOREM  3.4:  JLP/D  is  optimal  with  respect  to  makespan  for  arbitrary  DAGs 
nm  on  loosely  coupled  systems  and  satisfying  the  Sufficient  Processors  and  Short 
Communication  Times  hypotheses. 

PROOF:  As  a  consequence  of  Lemma  3.1,  JLP/D  produces  an  optimal  schedule  pro- 
vided that  it  is  feasible,  since  it  starts  all  tasks  at  the  times  given  by  JLP.  (It  is  cru- 
cial here  that  Steps  1.2.1  and  1.2.3  of  the  JLP  algorithm  do  not  depend  on  the  previ- 
ous assignments  of  tasks  to  processors-only  on  the  S{i)  and  F{i).  Therefore,  the 
copying  and  reassigning  of  processors  will  not  change  the  value  calculated  for  S{j), 
only  the  processor  assigned  to  it.)  To  see  that  JLP/D  creates  a  feasible  schedule, 
refer  to  the  proof  of  feasibility  in  the  case  of  a  forest  of  in-trees  to  see  that,  even  for 
arbitrary  precedence  relations,  the  starting  times,  S{t)  are  late  enough  to  assure  that 
all  predecessors  (and  all  their  copies)  have  finished  execution  and  enough  time  has 
passed  for  all  messages  to  have  arrived  at  the  chosen  processor.  Moreover,  the 
modification  to  JLP  precisely  checks  for  any  problem  due  to  two  tasks  being 
scheduled  simultaneously  on  the  same  processor  and  changes  processors  to  avoid  the 
conflict.  Since  all  predecessors  of  the  task  so  moved  are  also  copied,  a  simple  induc- 
tion proof  establishes  that  the  new  schedule  produced  remains  feasible.  □ 


CHAPTER  IV 

OTHER  MULTIPROCESSOR  SCHEDULING  PROBLEMS 


In  the  course  of  this  work,  a  large  number  of  different  schedviling  problems  have 
been  discussed.  Despite  their  differences,  however,  they  all  share  many  basic  charac- 
teristics. Most  of  groimd  rdes  were  presented  in  Chapter  I  or  in  the  discussion  of 
communication  in  Chapter  HI;  nonetheless,  some  were  tacitly  assmned.  This  closing 
chapter  takes  a  look  at  some  computer  systems  in  which  different  assumptions  are 
appropriate  and  also  examines  some  different  kinds  of  scheduling  problems  with  the 
intention  of  providing  both  contrast  for  the  foregoing  work  as  well  as  future  avenues 
of  extension. 

4.1  ■  More  MTMD      hpHnling  Prnhlpm.; 

Several  performance  measures  were  introduced  and  briefly  discussed  in  Section 
1.1  in  order  to  emphasize  that  meaningful  scheduling  must  always  be  related  to  the 
achievement  of  some  measurable  goal.  Although  improving  one  or  more  of  the  per- 
formance measures  used  to  evaluate  a  system  may  be  the  long  range  goal,  scheduling 
methods  are  usually  directed  at  optimizing  some  more  immediate  values.  An  indus- 
trial organization,  for  example,  may  wish  to  lower  the  dollar  value  of  its  inventory  of 
raw  materials  through  more  careful  scheduling  of  its  manufacturing  process.  Rather 
than  talking  about  minimizing  inventory,  however,  the  relationship  between  tur- 
naround times  and  inventory  can  be  exploited:  longer  average  turnaround  time 
means  higher  average  number  of  jobs  waiting  to  be  completed  and  therefore  greater 
amounts  of  the  necessary  raw  materials  must  be  available.  In  a  data  processing 
center  that  wishes  to  maximize  its  profit,  there  may  be  more  income  for  finishing 
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more  jobs  per  day  and  also  penalties  for  jobs  finished  after  established  deadlines.  In 
this  case,  scheduling  methods  minimizing  the  nmnber  of  late  jobs  or  minimizing  the 
makespan  of  a  collection  of  jobs  may  be  selected. 

Chapter  III  gave  much  attention  to  miiumizing  the  average  turnaround  time, 
one  of  the  most  common  objectives  of  scheduling  problems  in  the  literature.  Not- 
withstanding, it  is  recognized  that  policies  optimizing  average  turnaround  time  also 
prejudice  long  jobs.  If  user  satisfaction  is  linked  to  getting  jobs  done  as  soon  as  possi- 
ble, lowering  average  turnaround  time  should  mean  that,  on  average,  customer  satis- 
faction is  increased.  The  diflBculty  is  that  some  few  ciistomers  may  become  very 
dissatisfied  at  the  same  time.  It  may,  therefore,  be  found  better  to  try  to  minimize 
the  maximum  turnaround  time,  causing  some  small  displeasure  for  customers  with 
longer  waiting  times  but  assuring  all  of  reasonable  time  to  completion.  Even  more 
"fair"  to  customers  would  be  to  try  to  minimize  the  variance  of  the  turnaround  times 
without  allowing  the  average  turnaround  time  to  increase  too  much.  Policies  such  as 
Shortest  Job  First  tend  to  increase  variance  rather  than  minimize  it;  hence,  practical 
computer  schedulers  concerned  with  customer  satisfaction  use  some  form  of  modified 
SJF  which  raises  priorities  on  jobs  that  have  had  to  wait  a  long  time  for  service. 

Of  coiu^e,  many  more  esoteric  objective  functions  have  been  used.  For  exam- 
ple, minimizing  root  mean  square  tardiness  tends  to  avoid  very  tardy  jobs  more  than 
if  the  objective  is  just  to  minimize  the  number  of  tardy  jobs  or  the  average  tardiness. 
Many  scheduling  problems  are  posed  in  terms  of  optimizing  some  sort  of  weighted 
average,  counting  the  completion  of  more  'Wportant"  jobs  more  heavily  than  others. 
In  general,  the  choice  of  objective  function  depends  on  many  factors  and  can  be 
difficult;  in  the  end,  however,  this  choice  is  often  dictated  by  the  need  for  simplicity 
in  producing  a  tractable  problem.  Most  research  concentrates  on  a  small  number  of 
possible  objectives,  largely  because  other  objectives  present  far  more  difficult  prob- 
lems for  analysis. 
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Besides  changing  the  objective  function,  several  kinds  of  assumptions  can  be 
made  on  the  processors.  In  their  computerized  summary  of  scheduling  results 
referred  to  already  in  this  work,  Lageweg  et  al.  p^AGESlb]  include  the  cases  in 
which  processors  are  equivalent  but  work  at  different  speeds,  in  which  processors  are 
completely  unrelated  in  their  capabilities,  and  in  which  processors  are  of  k  different 
types  corresponding  to  k  different  operations  which  must  be  performed  on  each  job. 
Many  multiple  processor  systems  (MPS)  contain  a  variety  of  processors  or  processors 
which  cannot  work  independently  of  one  another.  Additionally,  intelligent  I/O  chan- 
nels are  processors  dedicated  to  specialized  activities.  Careful  modeling  of  such  sys- 
tems requires  different  assimiptions  on  processors  as  well  as  on  job  structure.  In  Sec- 
tion 4.2  of  this  chapter,  more  will  be  said  about  specialized  computer  architectures 
and  their  scheduling  problems. 

It  is  important  to  keep  in  mind  that  the  added  complication  of  interprocessor 
commtmication  overhead  in  loosely  coupled  systems  places  these  scheduling  problems 
completely  outside  of  the  traditional  classification  schemes.  This  extra  aspect  in 
computer  system  behavior  has  engendered  several  different  approaches,  including  new 
performance  measures  and  new  techniques  such  as  distributed  scheduler  [STAN84]. 
Efiicient  use  of  such  systems  may  be  seen  as  maximizing  throughput,  as  before,  but  it 
can  also  be  seen  as  maximizing  the  average  processor  utilization,  maximizing  the 
minimum  processor  utilization  or  minimizing  the  communication  time.  W.  W.  Chu 
and  others  [CHUL84a,  CHUL84b]  have  presented  models  of  distributed  processing 
systems  which  focus  on  the  communications  between  processes  and  the  delays  these 
cause  in  order  to  provide  methods  of  prediction  and  performance  analysis  more 
relevant  to  these  systems.  Y.-C.  Chow  [CHOW79]  and  T.  C.  K.  Chou  [CHOU82] 
have  studied  the  question  of  load  balancing  in  these  systems  as  a  dynamic  problem. 
Load  balancing  methods  address  the  problem  of  task  allocation  by  attempting  to 
keep  all  processors  approximately  equally  busy.  Although  this  objective  is  different 
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from  those  discussed,  it  clearly  tends  to  produce  system  utilization  which  also 
improves  performance  as  measured  by  other  scheduling  criteria. 

The  investigation  of  loosely  coupled  systems  carried  out  in  Chapter  HI  con- 
cerned itself  entirely  with  the  scheduling  problem  of  minimizing  the  makespan  of 
precedence  related  tasks  assuming  six  properties  of  the  communication  overhead  in 
the  system.  These  assmnptions  are  listed  in  Section  3.1  with  the  intent  of  imposing 
sufl&cient  structxire  on  the  problem  as  to  be  able  to  treat  scheduling  with  commimica- 
tion  overhead  as  a  well-behaved,  deterministic  activity.  If,  for  example,  the  time, 
d{P,P),  required  to  send  a  single  message  unit  from  P  to  P  were  to  vary  with  time 
due  to  events  outside  the  control  of  the  schedder,  then  it  would  not  be  possible  to 
predict  actual  time  lost  due  to  the  communications.  If  commtmication  protocols 
allowed  collisions,  once  again  it  would  not  be  possible  to  predict  the  communication 
costs  exactly.  Without  these  assimiptions,  however,  it  would  be  possible  to  carry  out 
non-deterministic  analysis  given  the  probabilistic  information  on  collisions  and  chan- 
nel speeds. 

These  six  assumptions  alone,  however,  are  not  enough  to  obtain  reasonable 
results  or  scheduling  algorithms.  Subsequently,  Section  3.2  introduced  three  addi- 
tional hypotheses:  Short  Communication  Delays,  Identical  Links,  and  Fully  Con- 
nected architecture.  It  is  possible  to  define  a  number  of  deterministic  scheduling 
problems  which  do  not  satisfy  one  or  more  of  these  three  conditions,  thus  it  is  in  this 
area  that  the  author  believes  that  productive  research  can  be  done.  Section  3.3  com- 
mented that  the  JLP  algorithm  is  no  longer  optimal  if  Short  Communication  Delays 
does  not  hold,  but  no  alternative  is  suggested.  J.-J.  Hwang,  however,  does  present  in 
his  dissertation  [HWAN87]  a  heuristic  scheduling  strategy.  Earliest  Task  First  (ETF), 
together  with  the  worst-case  bound  presented  as  Proposition  3.1  above,  which  looks 
promising  even  with  arbitrary  communication  delays.  ETF,  for  example,  produces 
optimal  schedules  in  the  case  of  a  forest  of  in-trees  and  with  the  Sufficient  Processors 
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hypothesis,  as  does  Join  Latest  Predecessor  (JLP),  but  also  does  optimally  in  the  case 
of  a  forest  of  out-trees  with  extremely  long  commimication  delays,  where  JLP  fails 
miserably.  Another  possibility  is  to  use  the  JLP/D  algorithm  of  Section  3.3  as  a 
heuristic  in  case  that  running  duplicate  copies  of  tasks  is  allowed.  It  appears  that  a 
reasonable  worst-case  bound  is  obtainable  for  this  heuristic  in  the  face  of  arbitrary 
communication  delays. 

The  Identical  Links  assumption,  which  makes  all  the  message  transmission  rates 
equal,  is  probably  not  such  a  critical  requirement.  Removing  this  assumption  may  be 
about  the  same  level  of  complication  as  moving  from  identical  processors  to  homo- 
geneous processors:  processors  which  differ  only  in  speed.  Many  optimal  algorithms 
have  been  obtained  for  scheduling  problems  in  such  an  environment,  although  other 
formerly  easy  problems  become  NP-hard  [LAWL82,  LAGESlb]. 

The  Fully  Connected  assumption  is,  perhaps,  the  least  realistic  of  the  restric- 
tions, particularly  if  many  processors  are  involved.  Unfortunately,  the  alternative- 
not  fully  connected-is  not  one,  but  a  panoply  of  different  problems.  Fully  Connected 
not  only  hypothesizes  that  it  is  possible  to  get  from  any  processor  to  any  other,  but 
also  that  such  communication  is  direct  and  contention  free.  In  a  single-bus  system, 
communication  is  direct  but  contention  ridden;  in  a  hypercube  system,  communica- 
tion is  frequently  indirect  and  experiences  queuing  delays  at  intermediate  nodes. 
With  more  general  network  topologies,  routing  becomes  a  major  issue:  certain  links 
may  suffer  high  contention  and  others  be  essentially  contention  free.  This  is  an 
important  area  of  research,  since  it  is  here  that  contact  is  made  with  real  systems, 
but  each  case  will  have  to  be  approached  separately  using  heuristic  algorithms  and 
non-deterministic  analysis. 
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4.2.  SnVID  fl.nH  SppcializeH  Arnhitectnrp  Prnhlpms 

Throughout  this  work,  the  underlying  model  has  been  that  of  the  general- 
purpose  MIMD  computer  in  which  the  tasks  are  thought  of  as  program  modules  and 
each  processor  works  asynchronoxisly  and  independently  from  every  other.  Many  of 
the  important  issues  of  eflBcient  use  of  computing  power  today  deal,  on  the  contrary, 
with  vector  processors,  systolic  arrays,  dataflow  architectures,  and  other  combina- 
tions of  processors  and  control  systems  for  which  quite  different  models  are  needed. 
The  entire  discvission  in  Chapter  11  applies  primarily  to  the  concern  for  user  satisfac- 
tion in  an  interactive  environment  or  other  situation  in  which  the  turnaround  times 
of  sizable  jobs  are  of  interest.  The  results  of  Chapter  EI,  on  the  other  hand,  can  also 
be  significant  in  case  the  tasks  are  single  instructions  or  single  operations  and  the 
DAG  models  the  evaluation  of  a  single  expression.  The  following  paragraphs  briefly 
discuss  some  of  these  alternative  architectures  and  their  scheduling  problems. 

The  vector  processor  is  an  example  of  a  synchronous  SIMD  architecture  capable 
of  performing  a  particular  operation  simultaneously  on  some  nimiber  of  different 
values  or  pairs  of  values.  As  the  name  suggests,  it  is  ideal  for  carrying  out  such  vec- 
tor operations  as  vector  sums  and  inner  products  which  are  typical  of  many  impor- 
tant scientific  applications.  While  the  problem  of  developing  efficient  algorithms  to 
utilize  this  specialized  architecture  is  an  important  area  of  current  research,  from  the 
point  of  view  of  scheduling,  this  problem  is  actually  no  different  from  the  classical 
scheduling  problems.  Once  the  algorithm  is  fixed,  the  whole  vector  operation  is  best 
treated  as  a  single  operation,  reducing  the  problem  to  the  equivalent  of  a  single  pro- 
cessor problem. 

Another  closely  allied  synchronous  architecture  is  that  of  the  systolic  array.  In 
this  case,  however,  the  processors  are  typically  specialized  operators  and  at  least  part 
of  the  input  data  to  one  processor  is  the  output  from  a  neighboring  processor.  One 
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or  more  data  streams  march  through  a  prescribed  sequence  of  processors  in  lock-step, 
the  output  being  taken  from  one  or  more  of  the  processors  last  visited  in  the 
sequence.  Very  similar  to  this  is  the  dataflow  computer,  except  that  in  this  case  the 
processors  are  asynchronous,  their  operations  being  triggered  by  the  appearance  of 
input  data  from  a  neighboring  processor.  In  both  cases  there  are  typically  a  very 
large  nvmibers  of  processors  with  very  sparse  interconnections;  say  a  rectangular 
array  with  each  processor  connected  to  the  four  nearest  neighbors.  Once  again, 
major  research  questions  for  such  architectures  are  methodologies  for  creating  algo- 
rithms and  designs  which  are  compatible:  what  is  known  as  "mapping"  applications 
to  architectures.  Nevertheless,  much  of  the  discussion  in  Chapter  HI  is  relevant  to 
dataflow  architectures,  where  tasks  may  well  be  considered  UET  and  communication 
delays  short.  The  sparse  interconnections  create  new  considerations,  but  the  com- 
munications only  go  to  nearest  neighbors  and  are,  therefore,  contention  free. 
Optimal  scheduling  of  a  DAG  representing  the  precedence  relations  among 
instruction-size  tasks  under  these  conditions  is  a  challenging  area  for  continued  inves- 
tigation. 


4.3.  Open  Questions 

There  remain,  as  must  be  the  case  in  an  actively  expanding  area  of  research 
such  as  this,  many  questions  whose  answers  appear  close  at  hand,  but  which  may  still 
be  very  diflBcult.  Presented  below  are  only  the  most  immediate  extensions  of  this 
research  which  the  author  feels  should  be  attacked  next. 

(1)  Continue  the  simulation  studies  on  the  same  schedviling  methods  of  Chapter  11 
as  well  as  using  the  definition  of  Size{J)  of  a  job  /  (Section  2.5.)  to  define  alter- 
native methods. 
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(2)  Obtain  a  tighter  bound  on  the  amount  by  which  Size[J)  can  underestimate  the 
optimal  makespan  of  /.  The  author  believes  that  makespan/5i2e(/)  is  asymp- 
totic to  3/2  as  m  becomes  large,  rather  than  to  2  as  implied  by  the  bound  of 
Proposition  2,2. 

(3)  Implement  some  of  the  algorithms  on  a  real  MPS  to  study  their  actual  perfor- 
mance. The  overhead  associated  with  running  the  scheduling  algorithms  is 
neglected  in  the  theoretical  discussion,  other  than  to  establish  their  time  com- 
plexity. For  a  dynamic  scheduler,  this  overhead  could  determine  if  it  is  of  prac- 
tical value. 

(4)  Determine  a  worst  case  bound  for  the  JLP/D  algorithm  when  applied  to  prob- 
lems with  arbitrary  communication  delays. 

(5)  Prove  the  conjecture  that  EJLP  is  optimal  for  m  <  6  and  determine  if  a  simple 
modification  makes  it  optimal  for  arbitrary  m . 

(6)  Find  a  reasonable  scheduling  policy  for  a  single-bus  system.  Such  a  policy  must 
either  assume  knowledge  of  the  priorities  given  by  the  biis  protocol  or  else  give 
only  a  heuristic  method,  since  bus  contention  will  cause  significant  delays  in 
message  passing. 
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GLOSSARY 


CPT 

Critical  Path  Time 

(Measure  of  job  length) 

CPU 

Central  Processing  Unit 

CSJF 

Concurrent  Shortest  Job  First 

(Scheduling  algorithm) 

DAG 

Directed  Acyclic  Graph 

DM 

Degree  of  Multiprogramming 

EJLP 

Extended  Join  Latest  Predecessor 

(Scheduling  algorithm) 

ETF 

Earliest  Task  First 

(Scheduling  algorithm) 

FAT 

First  Available  Task 

(Scheduling  algorithm) 

FCFS 

First  Come  First  Served 

(Scheduling  algorithm) 

HLF 

Highest  Level  First 

(Scheduling  algorithm) 

JLP 

Join  Latest  Predecessor 

(Scheduling  algorithm) 

JLP/D 

Join  Latest  Predecessor  with  Duplications 

(Scheduling  algorithm) 

LJF 

Longest  Job  First 

(Scheduling  algorithm) 

LRTF 

Longest  Remaining  Task  First 

(Scheduling  algorithm) 

MIMD 

Multiple  Instruction  Multiple  Data 

(System  type) 

MISF 

Most  Immediate  Successors  First 

(Scheduling  algorithm) 

MPS 

Multiple  Processor  System 

(System  type) 

NP 

Nondeterministic  Polynomial 

SCPF 

Shortest  Critical  Path  First 

(Scheduling  algorithm) 

SIMD 

Single  Instruction  Multiple  Data 

(System  type) 

SISD 

Single  Instruction  Single  Data 

(System  type) 

SJF 

Shortest  Job  First 

(Scheduling  algorithm) 

SPT 

Shortest  Processing  Time 

(Scheduling  algorithm) 
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SRCPF  Shortest  Remaining  Critical  Path  First 

SRTF  Shortest  Remaining  Time  First 

SSJF  Sequential  Shortest  Job  First 

TPT  Total  Processing  Time 

UCT  Unit  Communication  Times 

UET  Unit  Execution  Times 


(Scheduling  algorithm) 
(Scheduling  algorithm) 
(Scheduling  algorithm) 
(Measure  of  job  length) 
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APPENDIX  A 

RESULTS  OF  FRIEDMAN  TWO-WAY  ANALYSIS  OF  RANK  VARIANCE 


I.  Results  using  the  First  Available  Task  (FAT)  Task-Level  Strategy 


DM  =h      m  =  3 

SJF 

SRTF 

SCPF 

SRCPF 

Random 

WideJobs 

4.0 

3.0 

5.0 

LO 

2.0 

LongJobs 

5.0 

2.0 

4.0 

LO 

3.0 

RandomJobs 

4.0 

3.0 

2.0 

5.0 

LO 

Xr  =  4.533 


DM  =5       m  =  5 

SJF 

SRTF 

SCPF 

SRCPF 

Random 

WideJobs 

5.0 

2.5 

4.0 

2.5 

1.0 

LongJobs 

2.0 

5.0 

1.0 

4.0 

3.0 

RandomJobs 

5.0 

2.0 

4.0 

3.0 

1.0 

Xr  =3.400 


DM  =5       m  =  7 

SJF 

SRTF 

SCPF 

SRCPF 

Random 

WideJobs 

3.0 

2.0 

4.0 

5.0 

1.0 

LongJobs 

2.0 

1.0 

5.0 

4.0 

3.0 

RandomJobs 

4.0 

1.0 

2.0 

5.0 

3.0 

X,  =  7.733 


DM  =5       m  =  9 

SJF 

SRTF 

SCPF 

SRCPF 

Random 

WideJobs 

1.0 

3.0 

4.5 

4.5 

2.0 

LongJobs 

1.0 

2.0 

5.0 

4.0 

3.0 

RandomJobs 

1.0 

2.0 

5.0 

4.0 

3.0 

Xr  =  11.133 


DM  =5       m  =  11 
WideJobs 
LongJobs 
RandomJobs 
X.  =  11.467 


SJF 

SRTF 

SCPF 

SRCPF 

Random 

2.0 

1.0 

5.0 

4.0 

3.0 

2.0 

1.0 

5.0 

4.0 

3.0 

1.0 

2.0 

5.0 

4.0 

3.0 

DM  =5       m  =  13 
WideJobs 
LongJobs 
RandomJobs 
Xr  =  10.933 


SJF 

SRTF 

SCPF 

SRCPF 

Random 

1.0 

2.0 

4.0 

5.0 

3.0 

2.0 

1.0 

5.0 

4.0 

3.0 

1.0 

2.0 

5.0 

4.0 

3.0 

DM  =5      m  =  15  SJF 

WldeJobs  2.0 

LongJobs  2.0 

RandomJobs  2.0 
11.467 

DM  =  10     m  =  3  SJF 

WldeJobs  5.0 

LongJobs  4.0 

RandomJobs  3.0 
6.667 

DM  =  10     m  =  5  SJF 

WldeJobs  5.0 

LongJobs  5.0 

RandomJobs  3.0 
7.200 

DM  =  10     m  =  7  SJF 

WldeJobs  5.0 

LongJobs  2.0 

RandomJobs  3.0 
7.467 

DM  =10     m  =  9  SJF 

WldeJobs  4.0 

LongJobs  2.0 

RandomJobs  2.0 
10.400 

DM  =  10      m  =11  SJF 

WldeJobs  5.0 

LongJobs  1.0 

RandomJobs  2.0 
3.200 

Z>M  =  10      m  =  13  SJF 

WldeJobs  2.0 

LongJobs  2.0 

RandomJobs  5.0 
1.333 


SRTF 

SCPF 

SRCPF 

Random 

1.0 

4.5 

4.5 

3.0 

1.0 

4.5 

4.5 

3.0 

1.0 

4.0 

5.0 

3.0 

SRTF 

SCPF 

SRCPF 

Random 

4.0 

2.0 

3.0 

1.0 

1.0 

3.0 

5.0 

2.0 

1.0 

4.0 

5.0 

2.0 

SRTF 

SCPF 

SRCPF 

Random 

3.0 

2.0 

4.0 

1.0 

4.0 

1.0 

3.0 

2.0 

2.0 

4.0 

5.0 

1.0 

SRTF 

SCPF 

SRCPF 

Random 

2.0 

4.0 

3.0 

1.0 

4.0 

5.0 

3.0 

1.0 

1.0 

5.0 

4.0 

2.0 

SRTF 

SCPF 

SRCPF 

Random 

5.0 

2.0 

3.0 

1.0 

5.0 

3.0 

4.0 

1.0 

5.0 

3.0 

4.0 

1.0 

SRTF 

SCPF 

SRCPF 

Random 

3.0 

2.0 

4.0 

1.0 

2.0 

5.0 

3.0 

4.0 

3.0 

5.0 

4.0 

1.0 

SRTF 

SCPF 

SRCPF 

Random 

3.0 

4.0 

5.0 

1.0 

1.0 

5.0 

4.0 

3.0 

4.0 

1.0 

2.0 

3.0 
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I>M  =  10     m  =  15  SJF 

WldeJobs  3.0 

Long  Jobs  1.0 

RandomJobs  2.0 
Xr  =  9-333 

DM  =20      m  =  3  SJF 

WldeJobs  4.0 

LongJobs  3.0 

RandomJobs  2.0 
;(^=  7.200 

DM  =20     m  =  5  SJF 

WldeJobs  4.0 

LongJobs  4.0 

RandomJobs  2.0 
Xr  =  3.467 

DM  =20     m  =  7  SJF 

WldeJobs  3.0 

LongJobs  2.0 

RandomJobs  1.0 
Xr  =  6.933 

DM  =20      m  =  9  SJF 

WideJobs  5.0 

LongJobs  1.0 

RandomJobs  2.0 
Xr  =  0.800 

DM  =20      m  =  11  SJF 

WldeJobs  5.0 

LongJobs  3.0 

RandomJobs  4.0 
Xr  =  8.533 

DM  =20      m  =13  SJF 

WldeJobs  3.0 

LongJobs  2.0 

RandomJobs  3.0 
Xr  =  8.267 


SCPF 

Random 

2.0 

4.0 

5.0 

1.0 

2.0 

5.0 

4.0 

3.0 

3.0 

5.0 

4.0 

1.0 

SRTF 

SCPF 

SRCPF 

Random 

5.0 

3.0 

2.0 

1.0 

5.0 

2.0 

4.0 

1.0 

3.0 

5.0 

4.0 

1.0 

SRTF 

SCPF 

SRCPF 

Random 

3.0 

2.0 

5.0 

1.0 

5.0 

2.0 

1.0 

3.0 

3.0 

4.0 

5.0 

1.0 

SRTF 

SCPF 

SRCPF 

Random 

5.0 

2.0 

4.0 

1.0 

3.0 

4.0 

5.0 

1.0 

2.0 

4.0 

5.0 

3.0 

SRTF 

SCPF 

SRCPF 

Random 

3.0 

2.0 

4.0 

1.0 

5.0 

2.0 

4.0 

3.0 

1.0 

5.0 

3.0 

4.0 

SRTF 

SCPF 

SRCPF 

Random 

4.0 

3.0 

2.0 

1.0 

5.0 

2.0 

4.0 

1.0 

5.0 

3.0 

1.0 

2.0 

SRTF 

SCPF 

SRCPF 

Random 

5.0 

2.0 

4.0 

1.0 

4.0 

5.0 

3.0 

1.0 

4.0 

5.0 

2.0 

1.0 
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DM  =20      m  =15 

SJF 

SRTF 

SCPF 

SRCPF 

Random 

WideJobs 

5.0 

4.0 

2.0 

3.0 

1.0 

LongJobs 

5.0 

2.0 

4.0 

3.0 

1.0 

RandomJobs 

3.0 

5.0 

4.0 

1.0 

2.0 

Xr  =  .6.667 


n.    Results  using  the  Most  Immediate  Successors  First  (MISF)  Task-Level  Strategy 


=8, 


DM  =5  m 

=  3 

SJF 

SRTF 

SCPF 

SRCPF 

Random 

WideJobs 

4.0 

5.0 

2.0 

3.0 

1.0 

LongJobs 

4.0 

5.0 

3.0 

2.0 

1.0 

RandomJobs 

3.0 

4.0 

2.0 

5.0 

1.0 

1.333 

DM  =5  m 

=  5 

SJF 

SRTF 

SCPF 

SRCPF 

Random 

WideJobs 

3.0 

4.0 

2.0 

5.0 

1.0 

LongJobs 

3.0 

2.0 

5.0 

4.0 

1.0 

RandomJobs 

2.0 

3.0 

5.0 

4.0 

1.0 

1.267 

DM  =5  m 

=  7 

SJF 

SRTF 

SCPF 

SRCPF 

Random 

WideJobs 

5.0 

2.0 

4.0 

1.0 

3.0 

LongJobs 

4.0 

1.0 

2.0 

3.0 

5.0 

RandomJobs 

1.0 

4.0 

2.0 

5.0 

3.0 

.333 

DM  =5  m 

=  9 

SJF 

SRTF 

SCPF 

SRCPF 

Random 

WideJobs 

4.0 

3.0 

2.0 

1.0 

5.0 

LongJobs 

3.0 

1.0 

4.0 

2.0 

5.0 

RandomJobs 

4.0 

1.5 

3.0 

1.5 

5.0 

Xr  =9-667 


Xr  = 


DM  =5  m 

=  11 

SJF 

SRTF 

SCPF 

SRCPF 

Random 

WideJobs 

2.0 

1.0 

3.5 

3.5 

5.0 

LongJobs 

3.0 

1.0 

4.0 

2.0 

5.0 

RandomJobs 

3.5 

2.0 

3.5 

1.0 

5.0 

=  9.533 

DM  =5  m 

=  13 

SJF 

SRTF 

SCPF 

SRCPF 

Random 

WideJobs 

4.0 

2.0 

3.0 

1.0 

5.0 

LongJobs 

3.5 

1.5 

3.5 

1.5 

5.0 

RandomJobs 

3.5 

3.5 

2.0 

1.0 

5.0 

Xr  =  9.933 
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DM  =b      m  =  15  SJF 

WideJobs  4.0 

LongJobs  3.0 

RandomJobs  3.5 
=  8.467 

Z)M  =  10     m  =  3  SJF 

WideJobs  2.0 

LongJobs  5.0 

RandomJobs  4.0 
Xr  =  1-867 

=  10      m  =  5  SJF 

WideJobs  3.0 

LongJobs  4.0 

RandomJobs  5.0 
Xr  =  5.333 

DM  =  10      m  =7  SJF 

WideJobs  5.0 

LongJobs  4.0 

RandomJobs  4.0 
Xr  =8.800 

DM  =  10     m  =  9  SJF 

WideJobs  5.0 

LongJobs  3.0 

RandomJobs  2.0 
Xr  =  7.200 

DM  =  10      m  =11  SJF 

WideJobs  5.0 

LongJobs  5.0 

RandomJobs  2.0 
Xr  =  4.267 

DM  =  10      TO  =  13  SJF 

WideJobs  5.0 

LongJobs  4.0 

RandomJobs  2.0 
Xr  =  1-333 


SRTF 

SCPF 

SRCPF 

Random 

3.0 

2.0 

1.0 

5.0 

1.0 

3.0 

3.0 

5.0 

2.0 

3.5 

1.0 

5.0 

SRTF 

SCPF 

SRCPF 

Random 

5.0 

3.0 

4.0 

1.0 

1.0 

2.0 

3.0 

4.0 

3.0 

5.0 

2.0 

1.0 

SRTF 

SCPF 

SRCPF 

Random 

5.0 

2.0 

4.0 

1.0 

5.0 

2.0 

3.0 

1.0 

1.0 

4.0 

3.0 

2.0 

SRTF 

SCPF 

SRCPF 

Random 

1.0 

4.0 

3.0 

2.0 

3.0 

2.0 

5.0 

1.0 

2.0 

3.0 

5.0 

1.0 

SRTF 

SCPF 

SRCPF 

Random 

4.0 

3.0 

2.0 

1.0 

4.0 

2.0 

5.0 

1.0 

5.0 

4.0 

3.0 

1.0 

SRTF 

SCPF 

SRCPF 

Random 

2.0 

4.0 

3.0 

1.0 

1.0 

2.0 

3.0 

4.0 

3.0 

5.0 

4.0 

1.0 

SRTF 

SCPF 

SRCPF 

Random 

4.0 

2.0 

3.0 

1.0 

1.0 

2.0 

3.0 

5.0 

5.0 

4.0 

3.0 

1.0 
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DM  =10      m  =  15  SJF 

WideJobs  5.0 

LongJobs  4.0 

RandomJobs  5.0 
5.133 

DM  =20     m  =  3  SJF 

WideJobs  3.0 

LongJobs  5.0 

RandomJobs  3.0 
6.667 

DM  =20      m  =  5  SJF 

WideJobs  4.0 

LongJobs  2.0 

RandomJobs  3.0 
4.000 

DM  =20     m  =7  SJF 

WideJobs  5.0 

LongJobs  3.0 

RandomJobs  4.0 
4.533 

DM  =20      m  =  9  SJF 

WideJobs  5.0 

LongJobs  5.0 

RandomJobs  3.0 
6.133 

DM  =20      m  =11  SJF 

WideJobs  2.0 

LongJobs  2.0 

RandomJobs  1.0 
9.333 

DM  =20     m  =13  SJF 

WideJobs  3.0 

LongJobs  5.0 

RandomJobs  5.0 
7.200 


SRTF 

SCPF 

SRCPF 

Random 

4.0 

3.0 

2.0 

1.0 

1.0 

3.0 

2.0 

5.0 

2.5 

4.0 

2.5 

1.0 

SRTF 

SCPF 

SRCPF 

Random 

2.0 

4.0 

5.0 

1.0 

4.0 

1.0 

3.0 

2.0 

4.0 

2.0 

5.0 

1.0 

SRTF 

SCPF 

SRCPF 

Random 

5.0 

2.0 

3.0 

1.0 

4.0 

5.0 

1.0 

3.0 

2.0 

5.0 

4.0 

1.0 

SRTF 

SCPF 

SRCPF 

Random 

4.0 

1.0 

3.0 

2.0 

4.0 

1.0 

5.0 

2.0 

2.0 

5.0 

3.0 

1.0 

SRTF 

SCPF 

SRCPF 

Random 

3.0 

2.0 

4.0 

1.0 

4.0 

3.0 

2.0 

1.0 

1.0 

4.0 

5.0 

2.0 

SRTF 

SCPF 

SRCPF 

Random 

3.0 

4.0 

5.0 

1.0 

5.0 

4.0 

3.0 

1.0 

3.0 

4.0 

5.0 

2.0 

SRTF 

SCPF 

SRCPF 

Random 

2.0 

4.0 

5.0 

1.0 

3.0 

1.0 

4.0 

2.0 

1.0 

2.0 

4.0 

3.0 
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Xr  =8 


Xr  =8 


Xr  =6 


Xr  =7 


DM  =20  m 

=  15 

SJF 

SRTF 

SCPF 

SRCPF 

Random 

WideJobs 

3.0 

5.0 

2.0 

4.0 

1.0 

LongJobs 

4.0 

2.0 

3.0 

5.0 

1.0 

RandomJobs 

1.0 

4.0 

3.0 

5.0 

2.0 

.467 

DM  =30  m 

=  3 

SJF 

SRTF 

SCPF 

SRCPF 

Random 

WideJobs 

2.0 

5.0 

4.0 

3.0 

1.0 

LongJobs 

4.0 

2.0 

5.0 

3.0 

1.0 

RandomJobs 

3.0 

4.0 

5.0 

2.0 

1.0 

.800 

DM  =30  m 

=  5 

SJF 

SRTF 

SCPF 

SRCPF 

Random 

WideJobs 

5.0 

4.0 

2.0 

3.0 

1.0 

LongJobs 

5.0 

3.0 

2.0 

4.0 

1.0 

RandomJobs 

2.0 

4.0 

3.0 

5.0 

1.0 

i.267 

DM  =30  m 

=  7 

SJF 

SRTF 

SCPF 

SRCPF 

Random 

WideJobs 

2.0 

5.0 

4.0 

3.0 

1.0 

LongJobs 

4.0 

3.0 

1.0 

5.0 

2.0 

RandomJobs 

3.0 

5.0 

2.0 

4.0 

1.0 

.200 

DM  =30  m 

=  9 

SJF 

SRTF 

SCPF 

SRCPF 

Random 

WideJobs 

5.0 

4.0 

3.0 

2.0 

1.0 

LongJobs 

3.0 

2.0 

5.0 

4.0 

1.0 

RandomJobs 

3.0 

5.0 

2.0 

4.0 

1.0 

.133 

DM  =30  m 

=  11 

SJF 

SRTF 

SCPF 

SRCPF 

Random 

WideJobs 

1.0 

3.0 

5.0 

4.0 

2.0 

LongJobs 

4.0 

3.0 

2.0 

5.0 

1.0 

RandomJobs 

3.0 

2.0 

4.0 

5.0 

1.0 

.467 

DM  =30  m 

=  13 

SJF 

SRTF 

SCPF 

SRCPF 

Random 

WideJobs 

2.0 

3.0 

5.0 

4.0 

1.0 

LongJobs 

3.0 

2.0 

4.0 

5.0 

1.0 

RandomJobs 

5.0 

2.0 

4.0 

3.0 

1.0 

=  8.800 
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APPENDIX  B 

STATISTICAL  TEST  ASSUMPTIONS  AT  90%  CONFIDENCE  LEVEL 


1.  RANDOM  JOBS 


(1) 

(2) 

(3) 

(4) 

(5) 

(6) 

5 

504 

3 

NO 

SRCPF<RNDM 

0.057 

5 

504 

3 

YES 

SRCPF<RNDM 

0.065 

5 

504 

15 

NO 

SCPF  <SJF 

0.022 

5 

504 

15 

NO 

SCPF  <SRTF 

0.018 

5 

504 

15 

NO 

SRCPF<SJF 

0.015 

5 

504 

15 

NO 

SRCPF<SRTF 

0.012 

20 

519 

15 

YES 

SRCPF<SJF 

0.094 

20 

519 

15 

YES 

SRCPF<RNDM 

0.012 

n.  WIDE  JOBS 


(1) 

(2) 

(3) 

(4) 

(5) 

(6) 

5 

504 

15 

NO 

SCPF  <RNDM 

0.042 

5 

504 

15 

NO 

SRCPF<RNDM 

0.042 

m.  LONG  JOBS 


(1) 

(2) 

(3) 

(4) 

(5) 

(6) 

5 

504 

3 

NO 

SJF  <RNDM 

0.089 

5 

504 

9 

NO 

SCPF  <RNDM 

0.090 

5 

504 

9 

NO 

SCPF  <SRTF 

0.018 

5 

504 

9 

NO 

SCPF  <SJF 

0.016 

5 

504 

9 

NO 

SRCPF<SRTF 

0.028 

5 

504 

9 

NO 

SRCPF<SJF 

0.020 

5 

504 

15 

NO 

SCPF  <RNDM 

0.080 

5 

504 

15 

NO 

SCPF  <SRTF 

0.070 

5 

504 

15 

NO 

SCPF  <SJF 

0.010 

5 

504 

15 

NO 

SRCPF<RNDM 

0.080 

5 

504 

15 

NO 

SRCPF<SRTF 

0.070 

5 

504 

15 

NO 

SRCPF<SJF 

0.090 

5 

529 

15 

YES 

SRCPF<RNDM 

0.042 

5 

529 

15 

YES 

SRTF  <RNDM 

0.023 

Note:  (1)  Degree  of  multiprogramming  (2)  Number  of  jobs 
(3|  Number  of  processors  (4]  MSF 

(5)  Statistical  test  (6)  P  value 
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